Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-08-15 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-08-15 12:51:28)
> 
> On 14/08/2018 16:22, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
> > Looks like we should be able to hook this up to a selftest to confirm
> > the modification does land in the target context image, and a SRM to
> > confirm it loaded.
> 
> Lionel wrote an IGT which reads it back via SRM so that should be 
> covered. I will be posting the IGT counterpart at some point as well, 
> ideally when the agreement on the i915 side is there.

Wouldn't you rather have both? I know I would :-p

The emphasis is slightly different, IGT wants to cover the ABI
behaviour, selftests looking towards the edge cases hard to reach
otherwise. But even a trivial selftest would let us know the series
functions as intended.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-08-15 Thread Tvrtko Ursulin


On 14/08/2018 16:22, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-08-14 15:40:58)

+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+ struct intel_engine_cs *engine,
+ struct intel_sseu sseu)
+{
+   struct drm_i915_private *i915 = ctx->i915;
+   struct i915_request *rq;
+   struct intel_ring *ring;
+   int ret;
+
+   lockdep_assert_held(>drm.struct_mutex);
+
+   /* Submitting requests etc needs the hw awake. */
+   intel_runtime_pm_get(i915);
+
+   i915_retire_requests(i915);
+
+   /* Now use the RCS to actually reconfigure. */
+   engine = i915->engine[RCS];
+
+   rq = i915_request_alloc(engine, i915->kernel_context);
+   if (IS_ERR(rq)) {
+   ret = PTR_ERR(rq);
+   goto out_put;
+   }
+
+   ret = engine->emit_rpcs_config(rq, ctx, sseu);
+   if (ret)
+   goto out_add;
+
+   /* Queue this switch after all other activity */
+   list_for_each_entry(ring, >gt.active_rings, active_link) {
+   struct i915_request *prev;
+
+   prev = last_request_on_engine(ring->timeline, engine);
+   if (prev)
+   i915_sw_fence_await_sw_fence_gfp(>submit,
+>submit,
+I915_FENCE_GFP);
+   }
+
+   i915_gem_set_global_barrier(i915, rq);
+
+out_add:
+   i915_request_add(rq);
+out_put:
+   intel_runtime_pm_put(i915);
+
+   return ret;


Looks like we should be able to hook this up to a selftest to confirm
the modification does land in the target context image, and a SRM to
confirm it loaded.


Lionel wrote an IGT which reads it back via SRM so that should be 
covered. I will be posting the IGT counterpart at some point as well, 
ideally when the agreement on the i915 side is there.


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-08-15 Thread Tvrtko Ursulin


On 14/08/2018 19:53, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-08-14 19:44:09)


On 14/08/2018 15:59, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-08-14 15:40:58)

From: Chris Wilson 

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

  struct drm_i915_gem_context_param arg;
  struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
  .instance = 0, };

  memset(, 0, sizeof(arg));
  arg.ctx_id = ctx;
  arg.param = I915_CONTEXT_PARAM_SSEU;
  arg.value = (uintptr_t) 
  if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) {
  sseu.packed.subslice_mask = 0;

  drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, );
  }

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context (Chris)

v7: Synchronize the requests following a powergating setting change using a 
global
  dependency (Chris)
  Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
  Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)

v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
  s/dev_priv/i915/ (Tvrtko)
  Change uapi class/instance fields to u16 (Tvrtko)
  Bump mask fields to 64bits (Lionel)
  Don't return EPERM when dynamic sseu is disabled (Tvrtko)

v9: Import context image into kernel context's ppgtt only when
  reconfiguring powergated slice/subslices (Chris)
  Use aliasing ppgtt when needed (Michel)

Tvrtko Ursulin:

v10:
   * Update for upstream changes.
   * Request submit needs a RPM reference.
   * Reject on !FULL_PPGTT for simplicity.
   * Pull out get/set param to helpers for readability and less indent.
   * Use i915_request_await_dma_fence in add_global_barrier to skip waits
 on the same timeline and avoid GEM_BUG_ON.
   * No need to explicitly assign a NULL pointer to engine in legacy mode.
   * No need to move gen8_make_rpcs up.
   * Factored out global barrier as prep patch.
   * Allow to only CAP_SYS_ADMIN if !Gen11.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Issue: https://github.com/intel/media-driver/issues/267
Signed-off-by: Chris Wilson 
Signed-off-by: Lionel Landwerlin 
Cc: Dmitry Rogozhkin 
Cc: Tvrtko Ursulin 
Cc: Zhipeng Gong 
Cc: Joonas Lahtinen 
Signed-off-by: Tvrtko Ursulin 
---
   drivers/gpu/drm/i915/i915_gem_context.c | 187 +++-
   drivers/gpu/drm/i915/intel_lrc.c|  55 +++
   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
   include/uapi/drm/i915_drm.h |  43 ++
   4 files changed, 288 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 8a12984e7495..6d6220634e9e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, 
void *data,
  return 0;
   }
   
+static int

+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+ struct intel_engine_cs *engine,
+ struct intel_sseu sseu)
+{
+   struct drm_i915_private *i915 = ctx->i915;
+   struct i915_request *rq;
+   struct intel_ring *ring;
+   int ret;
+
+   lockdep_assert_held(>drm.struct_mutex);
+
+   /* Submitting requests etc needs the hw awake. */
+   intel_runtime_pm_get(i915);
+
+   i915_retire_requests(i915);


?


I wondered myself but did not make myself dig through all the history of
the series. Cannot think that it does anything useful in the current design.


+
+   /* Now use the RCS to actually reconfigure. */
+   

Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-08-14 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-08-14 19:44:09)
> 
> On 14/08/2018 15:59, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
> >> From: Chris Wilson 
> >>
> >> We want to allow userspace to reconfigure the subslice configuration for
> >> its own use case. To do so, we expose a context parameter to allow
> >> adjustment of the RPCS register stored within the context image (and
> >> currently not accessible via LRI). If the context is adjusted before
> >> first use, the adjustment is for "free"; otherwise if the context is
> >> active we flush the context off the GPU (stalling all users) and forcing
> >> the GPU to save the context to memory where we can modify it and so
> >> ensure that the register is reloaded on next execution.
> >>
> >> The overhead of managing additional EU subslices can be significant,
> >> especially in multi-context workloads. Non-GPGPU contexts should
> >> preferably disable the subslices it is not using, and others should
> >> fine-tune the number to match their workload.
> >>
> >> We expose complete control over the RPCS register, allowing
> >> configuration of slice/subslice, via masks packed into a u64 for
> >> simplicity. For example,
> >>
> >>  struct drm_i915_gem_context_param arg;
> >>  struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
> >>  .instance = 0, };
> >>
> >>  memset(, 0, sizeof(arg));
> >>  arg.ctx_id = ctx;
> >>  arg.param = I915_CONTEXT_PARAM_SSEU;
> >>  arg.value = (uintptr_t) 
> >>  if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) 
> >> {
> >>  sseu.packed.subslice_mask = 0;
> >>
> >>  drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, );
> >>  }
> >>
> >> could be used to disable all subslices where supported.
> >>
> >> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
> >> (Lionel)
> >>
> >> v3: Add ability to program this per engine (Chris)
> >>
> >> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> >>
> >> v5: Validate sseu configuration against the device's capabilities (Lionel)
> >>
> >> v6: Change context powergating settings through MI_SDM on kernel context 
> >> (Chris)
> >>
> >> v7: Synchronize the requests following a powergating setting change using 
> >> a global
> >>  dependency (Chris)
> >>  Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
> >>  Disable RPCS configuration setting for non capable users 
> >> (Lionel/Tvrtko)
> >>
> >> v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
> >>  s/dev_priv/i915/ (Tvrtko)
> >>  Change uapi class/instance fields to u16 (Tvrtko)
> >>  Bump mask fields to 64bits (Lionel)
> >>  Don't return EPERM when dynamic sseu is disabled (Tvrtko)
> >>
> >> v9: Import context image into kernel context's ppgtt only when
> >>  reconfiguring powergated slice/subslices (Chris)
> >>  Use aliasing ppgtt when needed (Michel)
> >>
> >> Tvrtko Ursulin:
> >>
> >> v10:
> >>   * Update for upstream changes.
> >>   * Request submit needs a RPM reference.
> >>   * Reject on !FULL_PPGTT for simplicity.
> >>   * Pull out get/set param to helpers for readability and less indent.
> >>   * Use i915_request_await_dma_fence in add_global_barrier to skip waits
> >> on the same timeline and avoid GEM_BUG_ON.
> >>   * No need to explicitly assign a NULL pointer to engine in legacy mode.
> >>   * No need to move gen8_make_rpcs up.
> >>   * Factored out global barrier as prep patch.
> >>   * Allow to only CAP_SYS_ADMIN if !Gen11.
> >>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> >> Issue: https://github.com/intel/media-driver/issues/267
> >> Signed-off-by: Chris Wilson 
> >> Signed-off-by: Lionel Landwerlin 
> >> Cc: Dmitry Rogozhkin 
> >> Cc: Tvrtko Ursulin 
> >> Cc: Zhipeng Gong 
> >> Cc: Joonas Lahtinen 
> >> Signed-off-by: Tvrtko Ursulin 
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem_context.c | 187 +++-
> >>   drivers/gpu/drm/i915/intel_lrc.c|  55 +++
> >>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
> >>   include/uapi/drm/i915_drm.h |  43 ++
> >>   4 files changed, 288 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
> >> b/drivers/gpu/drm/i915/i915_gem_context.c
> >> index 8a12984e7495..6d6220634e9e 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >> @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device 
> >> *dev, void *data,
> >>  return 0;
> >>   }
> >>   
> >> +static int
> >> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> >> + struct intel_engine_cs *engine,
> >> + struct intel_sseu sseu)
> >> +{
> >> +   struct drm_i915_private *i915 = ctx->i915;
> >> +   struct 

Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-08-14 Thread Tvrtko Ursulin


On 14/08/2018 15:59, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-08-14 15:40:58)

From: Chris Wilson 

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

 struct drm_i915_gem_context_param arg;
 struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
 .instance = 0, };

 memset(, 0, sizeof(arg));
 arg.ctx_id = ctx;
 arg.param = I915_CONTEXT_PARAM_SSEU;
 arg.value = (uintptr_t) 
 if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) {
 sseu.packed.subslice_mask = 0;

 drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, );
 }

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context (Chris)

v7: Synchronize the requests following a powergating setting change using a 
global
 dependency (Chris)
 Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
 Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)

v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
 s/dev_priv/i915/ (Tvrtko)
 Change uapi class/instance fields to u16 (Tvrtko)
 Bump mask fields to 64bits (Lionel)
 Don't return EPERM when dynamic sseu is disabled (Tvrtko)

v9: Import context image into kernel context's ppgtt only when
 reconfiguring powergated slice/subslices (Chris)
 Use aliasing ppgtt when needed (Michel)

Tvrtko Ursulin:

v10:
  * Update for upstream changes.
  * Request submit needs a RPM reference.
  * Reject on !FULL_PPGTT for simplicity.
  * Pull out get/set param to helpers for readability and less indent.
  * Use i915_request_await_dma_fence in add_global_barrier to skip waits
on the same timeline and avoid GEM_BUG_ON.
  * No need to explicitly assign a NULL pointer to engine in legacy mode.
  * No need to move gen8_make_rpcs up.
  * Factored out global barrier as prep patch.
  * Allow to only CAP_SYS_ADMIN if !Gen11.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Issue: https://github.com/intel/media-driver/issues/267
Signed-off-by: Chris Wilson 
Signed-off-by: Lionel Landwerlin 
Cc: Dmitry Rogozhkin 
Cc: Tvrtko Ursulin 
Cc: Zhipeng Gong 
Cc: Joonas Lahtinen 
Signed-off-by: Tvrtko Ursulin 
---
  drivers/gpu/drm/i915/i915_gem_context.c | 187 +++-
  drivers/gpu/drm/i915/intel_lrc.c|  55 +++
  drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
  include/uapi/drm/i915_drm.h |  43 ++
  4 files changed, 288 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 8a12984e7495..6d6220634e9e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, 
void *data,
 return 0;
  }
  
+static int

+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+ struct intel_engine_cs *engine,
+ struct intel_sseu sseu)
+{
+   struct drm_i915_private *i915 = ctx->i915;
+   struct i915_request *rq;
+   struct intel_ring *ring;
+   int ret;
+
+   lockdep_assert_held(>drm.struct_mutex);
+
+   /* Submitting requests etc needs the hw awake. */
+   intel_runtime_pm_get(i915);
+
+   i915_retire_requests(i915);


?


I wondered myself but did not make myself dig through all the history of 
the series. Cannot think that it does anything useful in the current design.



+
+   /* Now use the RCS to actually reconfigure. */
+   engine = i915->engine[RCS];


? Modifying registers stored in another engine's context image.


Well, I was hoping design was 

Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-08-14 Thread Lionel Landwerlin

On 14/08/18 17:05, Lionel Landwerlin wrote:

On 14/08/18 16:18, Chris Wilson wrote:

Quoting Lionel Landwerlin (2018-08-14 16:11:36)

On 14/08/18 15:59, Chris Wilson wrote:

And I'd still recommend not using indirect access if we can apply the
changes immediately.
-Chris


Hangs on Gen9 :(

How does modifying the context image of an idle (unpinned) context cause
a hang?
-Chris


I thought you meant a context modifying it's own RPCS register... no?

-
Lionel


Oh I get it now... Sorry, forget me :)

-
Lionel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-08-14 Thread Lionel Landwerlin

On 14/08/18 16:18, Chris Wilson wrote:

Quoting Lionel Landwerlin (2018-08-14 16:11:36)

On 14/08/18 15:59, Chris Wilson wrote:

And I'd still recommend not using indirect access if we can apply the
changes immediately.
-Chris


Hangs on Gen9 :(

How does modifying the context image of an idle (unpinned) context cause
a hang?
-Chris


I thought you meant a context modifying it's own RPCS register... no?

-
Lionel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-08-14 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
> +static int
> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> + struct intel_engine_cs *engine,
> + struct intel_sseu sseu)
> +{
> +   struct drm_i915_private *i915 = ctx->i915;
> +   struct i915_request *rq;
> +   struct intel_ring *ring;
> +   int ret;
> +
> +   lockdep_assert_held(>drm.struct_mutex);
> +
> +   /* Submitting requests etc needs the hw awake. */
> +   intel_runtime_pm_get(i915);
> +
> +   i915_retire_requests(i915);
> +
> +   /* Now use the RCS to actually reconfigure. */
> +   engine = i915->engine[RCS];
> +
> +   rq = i915_request_alloc(engine, i915->kernel_context);
> +   if (IS_ERR(rq)) {
> +   ret = PTR_ERR(rq);
> +   goto out_put;
> +   }
> +
> +   ret = engine->emit_rpcs_config(rq, ctx, sseu);
> +   if (ret)
> +   goto out_add;
> +
> +   /* Queue this switch after all other activity */
> +   list_for_each_entry(ring, >gt.active_rings, active_link) {
> +   struct i915_request *prev;
> +
> +   prev = last_request_on_engine(ring->timeline, engine);
> +   if (prev)
> +   i915_sw_fence_await_sw_fence_gfp(>submit,
> +>submit,
> +I915_FENCE_GFP);
> +   }
> +
> +   i915_gem_set_global_barrier(i915, rq);
> +
> +out_add:
> +   i915_request_add(rq);
> +out_put:
> +   intel_runtime_pm_put(i915);
> +
> +   return ret;

Looks like we should be able to hook this up to a selftest to confirm
the modification does land in the target context image, and a SRM to
confirm it loaded.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-08-14 Thread Chris Wilson
Quoting Lionel Landwerlin (2018-08-14 16:11:36)
> On 14/08/18 15:59, Chris Wilson wrote:
> > And I'd still recommend not using indirect access if we can apply the
> > changes immediately.
> > -Chris
> >
> 
> Hangs on Gen9 :(

How does modifying the context image of an idle (unpinned) context cause
a hang?
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-08-14 Thread Lionel Landwerlin

On 14/08/18 15:59, Chris Wilson wrote:

And I'd still recommend not using indirect access if we can apply the
changes immediately.
-Chris



Hangs on Gen9 :(

-
Lionel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-08-14 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
> From: Chris Wilson 
> 
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI). If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.
> 
> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.
> 
> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
> 
> struct drm_i915_gem_context_param arg;
> struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
> .instance = 0, };
> 
> memset(, 0, sizeof(arg));
> arg.ctx_id = ctx;
> arg.param = I915_CONTEXT_PARAM_SSEU;
> arg.value = (uintptr_t) 
> if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) {
> sseu.packed.subslice_mask = 0;
> 
> drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, );
> }
> 
> could be used to disable all subslices where supported.
> 
> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> 
> v3: Add ability to program this per engine (Chris)
> 
> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> 
> v5: Validate sseu configuration against the device's capabilities (Lionel)
> 
> v6: Change context powergating settings through MI_SDM on kernel context 
> (Chris)
> 
> v7: Synchronize the requests following a powergating setting change using a 
> global
> dependency (Chris)
> Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
> Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)
> 
> v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
> s/dev_priv/i915/ (Tvrtko)
> Change uapi class/instance fields to u16 (Tvrtko)
> Bump mask fields to 64bits (Lionel)
> Don't return EPERM when dynamic sseu is disabled (Tvrtko)
> 
> v9: Import context image into kernel context's ppgtt only when
> reconfiguring powergated slice/subslices (Chris)
> Use aliasing ppgtt when needed (Michel)
> 
> Tvrtko Ursulin:
> 
> v10:
>  * Update for upstream changes.
>  * Request submit needs a RPM reference.
>  * Reject on !FULL_PPGTT for simplicity.
>  * Pull out get/set param to helpers for readability and less indent.
>  * Use i915_request_await_dma_fence in add_global_barrier to skip waits
>on the same timeline and avoid GEM_BUG_ON.
>  * No need to explicitly assign a NULL pointer to engine in legacy mode.
>  * No need to move gen8_make_rpcs up.
>  * Factored out global barrier as prep patch.
>  * Allow to only CAP_SYS_ADMIN if !Gen11.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Issue: https://github.com/intel/media-driver/issues/267
> Signed-off-by: Chris Wilson 
> Signed-off-by: Lionel Landwerlin 
> Cc: Dmitry Rogozhkin 
> Cc: Tvrtko Ursulin 
> Cc: Zhipeng Gong 
> Cc: Joonas Lahtinen 
> Signed-off-by: Tvrtko Ursulin 
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 187 +++-
>  drivers/gpu/drm/i915/intel_lrc.c|  55 +++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>  include/uapi/drm/i915_drm.h |  43 ++
>  4 files changed, 288 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index 8a12984e7495..6d6220634e9e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device 
> *dev, void *data,
> return 0;
>  }
>  
> +static int
> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> + struct intel_engine_cs *engine,
> + struct intel_sseu sseu)
> +{
> +   struct drm_i915_private *i915 = ctx->i915;
> +   struct i915_request *rq;
> +   struct intel_ring *ring;
> +   int ret;
> +
> +   lockdep_assert_held(>drm.struct_mutex);
> +
> +   /* Submitting requests etc needs the hw awake. */
> +   intel_runtime_pm_get(i915);
> +
> +   i915_retire_requests(i915);

?

> +
> +   /* Now use the RCS to actually reconfigure. */
> +   engine = i915->engine[RCS];

? Modifying registers stored in another engine's context image.

> +
> +   rq = 

Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-09 Thread Lionel Landwerlin

On 08/05/18 21:56, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-05-03 18:18:43)

On 25/04/2018 12:45, Lionel Landwerlin wrote:

From: Chris Wilson 

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

   struct drm_i915_gem_context_param arg;
   struct drm_i915_gem_context_param_sseu sseu = { .flags = 
I915_EXEC_RENDER };

   memset(, 0, sizeof(arg));
   arg.ctx_id = ctx;
   arg.param = I915_CONTEXT_PARAM_SSEU;
   arg.value = (uintptr_t) 
   if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) {
   sseu.packed.subslice_mask = 0;

   drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, );
   }

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson 
Signed-off-by: Lionel Landwerlin 
c: Dmitry Rogozhkin 
CC: Tvrtko Ursulin 
CC: Zhipeng Gong 
CC: Joonas Lahtinen 
---
   drivers/gpu/drm/i915/i915_gem_context.c | 82 -
   drivers/gpu/drm/i915/intel_lrc.c| 55 +
   drivers/gpu/drm/i915/intel_lrc.h|  4 ++
   include/uapi/drm/i915_drm.h | 28 +
   4 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index bdf050beeb94..b97ddcf47514 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, 
void *data,
   return 0;
   }
   
+static struct i915_gem_context_sseu

+i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+  const struct drm_i915_gem_context_param_sseu 
*user_sseu)
+{
+ struct i915_gem_context_sseu value = {
+ .slice_mask = user_sseu->packed.slice_mask == 0 ?
+   sseu->slice_mask :
+   (user_sseu->packed.slice_mask & sseu->slice_mask),
+ .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
+  sseu->subslice_mask[0] :
+  (user_sseu->packed.subslice_mask & 
sseu->subslice_mask[0]),
+ .min_eus_per_subslice = 
min(user_sseu->packed.min_eus_per_subslice,
+ sseu->max_eus_per_subslice),
+ .max_eus_per_subslice = 
min(user_sseu->packed.max_eus_per_subslice,
+ sseu->max_eus_per_subslice),
+ };
+
+ return value;
+}
+
   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file)
   {
@@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
*dev, void *data,
   case I915_CONTEXT_PARAM_PRIORITY:
   args->value = ctx->sched.priority;
   break;
+ case I915_CONTEXT_PARAM_SSEU: {
+ struct drm_i915_gem_context_param_sseu param_sseu;
+ struct intel_engine_cs *engine;
+
+ if (copy_from_user(_sseu, u64_to_user_ptr(args->value),
+sizeof(param_sseu))) {
+ ret = -EFAULT;
+ break;
+ }
+
+ engine = i915_gem_engine_from_flags(to_i915(dev), file,
+ param_sseu.flags);
+ if (!engine) {
+ ret = -EINVAL;
+ break;
+ }
+
+ param_sseu.packed.slice_mask =
+ ctx->engine[engine->id].sseu.slice_mask;
+ param_sseu.packed.subslice_mask =
+ 

Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-08 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-05-03 18:18:43)
> 
> On 25/04/2018 12:45, Lionel Landwerlin wrote:
> > From: Chris Wilson 
> > 
> > We want to allow userspace to reconfigure the subslice configuration for
> > its own use case. To do so, we expose a context parameter to allow
> > adjustment of the RPCS register stored within the context image (and
> > currently not accessible via LRI). If the context is adjusted before
> > first use, the adjustment is for "free"; otherwise if the context is
> > active we flush the context off the GPU (stalling all users) and forcing
> > the GPU to save the context to memory where we can modify it and so
> > ensure that the register is reloaded on next execution.
> > 
> > The overhead of managing additional EU subslices can be significant,
> > especially in multi-context workloads. Non-GPGPU contexts should
> > preferably disable the subslices it is not using, and others should
> > fine-tune the number to match their workload.
> > 
> > We expose complete control over the RPCS register, allowing
> > configuration of slice/subslice, via masks packed into a u64 for
> > simplicity. For example,
> > 
> >   struct drm_i915_gem_context_param arg;
> >   struct drm_i915_gem_context_param_sseu sseu = { .flags = 
> > I915_EXEC_RENDER };
> > 
> >   memset(, 0, sizeof(arg));
> >   arg.ctx_id = ctx;
> >   arg.param = I915_CONTEXT_PARAM_SSEU;
> >   arg.value = (uintptr_t) 
> >   if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) {
> >   sseu.packed.subslice_mask = 0;
> > 
> >   drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, );
> >   }
> > 
> > could be used to disable all subslices where supported.
> > 
> > v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
> > (Lionel)
> > 
> > v3: Add ability to program this per engine (Chris)
> > 
> > v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> > Signed-off-by: Chris Wilson 
> > Signed-off-by: Lionel Landwerlin 
> > c: Dmitry Rogozhkin 
> > CC: Tvrtko Ursulin 
> > CC: Zhipeng Gong 
> > CC: Joonas Lahtinen 
> > ---
> >   drivers/gpu/drm/i915/i915_gem_context.c | 82 -
> >   drivers/gpu/drm/i915/intel_lrc.c| 55 +
> >   drivers/gpu/drm/i915/intel_lrc.h|  4 ++
> >   include/uapi/drm/i915_drm.h | 28 +
> >   4 files changed, 168 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
> > b/drivers/gpu/drm/i915/i915_gem_context.c
> > index bdf050beeb94..b97ddcf47514 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device 
> > *dev, void *data,
> >   return 0;
> >   }
> >   
> > +static struct i915_gem_context_sseu
> > +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> > +  const struct 
> > drm_i915_gem_context_param_sseu *user_sseu)
> > +{
> > + struct i915_gem_context_sseu value = {
> > + .slice_mask = user_sseu->packed.slice_mask == 0 ?
> > +   sseu->slice_mask :
> > +   (user_sseu->packed.slice_mask & 
> > sseu->slice_mask),
> > + .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
> > +  sseu->subslice_mask[0] :
> > +  (user_sseu->packed.subslice_mask & 
> > sseu->subslice_mask[0]),
> > + .min_eus_per_subslice = 
> > min(user_sseu->packed.min_eus_per_subslice,
> > + sseu->max_eus_per_subslice),
> > + .max_eus_per_subslice = 
> > min(user_sseu->packed.max_eus_per_subslice,
> > + sseu->max_eus_per_subslice),
> > + };
> > +
> > + return value;
> > +}
> > +
> >   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> >   struct drm_file *file)
> >   {
> > @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
> > *dev, void *data,
> >   case I915_CONTEXT_PARAM_PRIORITY:
> >   args->value = ctx->sched.priority;
> >   break;
> > + case I915_CONTEXT_PARAM_SSEU: {
> > + struct drm_i915_gem_context_param_sseu param_sseu;
> > + struct intel_engine_cs *engine;
> > +
> > + if (copy_from_user(_sseu, u64_to_user_ptr(args->value),
> > +sizeof(param_sseu))) {
> > + ret = -EFAULT;
> > + break;
> > + }
> > +
> > + engine = 

Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-08 Thread Rogozhkin, Dmitry V
>> Context of the question was whether you need to change slice configuration 
>> for a single GEM context between submitting batch buffers?

When we create a context we know the optimal slice count for it. And this 
optimal point does not change for this context in unperturbed conditions, i.e. 
when context runs alone. However, there are critical use cases when we never 
run single context, but instead we run few contexts in parallel and each of 
them has its own optimal slice count operating point. As a result, major 
question is: what is the cost of context switch if there is associated slice 
configuration change, powering on or off the slice(s)? In the ideal situation 
when the cost is 0, we don't need any change of slice configuration for single 
GEM context between submitting batch buffers. Problem is that cost is far from 
0. And the cost is non-tolerable for the worst case when we will have a switch 
at each next batch buffer. As a result, we are forced to have some negotiation 
channel between different contexts and make them agree on some single slice 
configuration which will exist for reasonably long period of time to have 
associated cost negligible in genera. During this period we will submit a 
number of batch buffers before next reconfiguration attempt. So, that's the 
situation when we need to reconfigure slice configuration for a single GEM 
context between submitting batches.

Dmitry.

-Original Message-
From: Tvrtko Ursulin [mailto:tvrtko.ursu...@linux.intel.com] 
Sent: Tuesday, May 8, 2018 1:25 AM
To: Rogozhkin, Dmitry V <dmitry.v.rogozh...@intel.com>; Landwerlin, Lionel G 
<lionel.g.landwer...@intel.com>; intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration 
to userspace


On 08/05/2018 05:04, Rogozhkin, Dmitry V wrote:
>>>  I'm pretty sure Dmitry wants dynamic configurations.
> 
> Yes, I afraid we really need dynamic slice configurations for media.

Context of the question was whether you need to change slice configuration for 
a single GEM context between submitting batch buffers?

Regards,

Tvrtko


> *From:*Landwerlin, Lionel G
> *Sent:* Friday, May 4, 2018 9:25 AM
> *To:* Tvrtko Ursulin <tvrtko.ursu...@linux.intel.com>; 
> intel-gfx@lists.freedesktop.org; Rogozhkin, Dmitry V 
> <dmitry.v.rogozh...@intel.com>
> *Subject:* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) 
> configuration to userspace
> 
> On 03/05/18 18:18, Tvrtko Ursulin wrote:
> 
>    +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> +  struct intel_engine_cs *engine,
> +  struct i915_gem_context_sseu *sseu)
> +{
> +    struct drm_i915_private *dev_priv = ctx->i915;
> +    struct intel_context *ce;
> +    enum intel_engine_id id;
> +    int ret;
> +
> +    lockdep_assert_held(_priv->drm.struct_mutex);
> +
> +    if (memcmp(sseu, >engine[engine->id].sseu,
> sizeof(*sseu)) == 0)
> +    return 0;
> +
> +    /*
> + * We can only program this on render ring.
> + */
> +    ce = >engine[RCS];
> +
> +    if (ce->pin_count) { /* Assume that the context is active! */
> +    ret = i915_gem_switch_to_kernel_context(dev_priv);
> +    if (ret)
> +    return ret;
> +
> +    ret = i915_gem_wait_for_idle(dev_priv,
> + I915_WAIT_INTERRUPTIBLE |
> + I915_WAIT_LOCKED);
> 
> 
> Could we consider the alternative of only allowing this to be
> configured on context create? That way we would not need to idle the
> GPU every time userspace decides to fiddle with it. It is
> unprivileged so quite an easy way for random app to ruin GPU
> performance for everyone.
> 
> Regards,
> 
> Tvrtko
> 
> I'm pretty sure Dmitry wants dynamic configurations.
> 
> Dmitry?
> 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-08 Thread Tvrtko Ursulin


On 08/05/2018 05:04, Rogozhkin, Dmitry V wrote:

 I'm pretty sure Dmitry wants dynamic configurations.


Yes, I afraid we really need dynamic slice configurations for media.


Context of the question was whether you need to change slice 
configuration for a single GEM context between submitting batch buffers?


Regards,

Tvrtko



*From:*Landwerlin, Lionel G
*Sent:* Friday, May 4, 2018 9:25 AM
*To:* Tvrtko Ursulin <tvrtko.ursu...@linux.intel.com>; 
intel-gfx@lists.freedesktop.org; Rogozhkin, Dmitry V 
<dmitry.v.rogozh...@intel.com>
*Subject:* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) 
configuration to userspace


On 03/05/18 18:18, Tvrtko Ursulin wrote:

   +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
+  struct intel_engine_cs *engine,
+  struct i915_gem_context_sseu *sseu)
+{
+    struct drm_i915_private *dev_priv = ctx->i915;
+    struct intel_context *ce;
+    enum intel_engine_id id;
+    int ret;
+
+    lockdep_assert_held(_priv->drm.struct_mutex);
+
+    if (memcmp(sseu, >engine[engine->id].sseu,
sizeof(*sseu)) == 0)
+    return 0;
+
+    /*
+ * We can only program this on render ring.
+ */
+    ce = >engine[RCS];
+
+    if (ce->pin_count) { /* Assume that the context is active! */
+    ret = i915_gem_switch_to_kernel_context(dev_priv);
+    if (ret)
+    return ret;
+
+    ret = i915_gem_wait_for_idle(dev_priv,
+ I915_WAIT_INTERRUPTIBLE |
+ I915_WAIT_LOCKED);


Could we consider the alternative of only allowing this to be
configured on context create? That way we would not need to idle the
GPU every time userspace decides to fiddle with it. It is
unprivileged so quite an easy way for random app to ruin GPU
performance for everyone.

Regards,

Tvrtko

I'm pretty sure Dmitry wants dynamic configurations.

Dmitry?


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-07 Thread Rogozhkin, Dmitry V
>> I'm pretty sure Dmitry wants dynamic configurations.

Yes, I afraid we really need dynamic slice configurations for media.

From: Landwerlin, Lionel G
Sent: Friday, May 4, 2018 9:25 AM
To: Tvrtko Ursulin <tvrtko.ursu...@linux.intel.com>; 
intel-gfx@lists.freedesktop.org; Rogozhkin, Dmitry V 
<dmitry.v.rogozh...@intel.com>
Subject: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration 
to userspace

On 03/05/18 18:18, Tvrtko Ursulin wrote:
  +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
+  struct intel_engine_cs *engine,
+  struct i915_gem_context_sseu *sseu)
+{
+struct drm_i915_private *dev_priv = ctx->i915;
+struct intel_context *ce;
+enum intel_engine_id id;
+int ret;
+
+lockdep_assert_held(_priv->drm.struct_mutex);
+
+if (memcmp(sseu, >engine[engine->id].sseu, sizeof(*sseu)) == 0)
+return 0;
+
+/*
+ * We can only program this on render ring.
+ */
+ce = >engine[RCS];
+
+if (ce->pin_count) { /* Assume that the context is active! */
+ret = i915_gem_switch_to_kernel_context(dev_priv);
+if (ret)
+return ret;
+
+ret = i915_gem_wait_for_idle(dev_priv,
+ I915_WAIT_INTERRUPTIBLE |
+ I915_WAIT_LOCKED);

Could we consider the alternative of only allowing this to be configured on 
context create? That way we would not need to idle the GPU every time userspace 
decides to fiddle with it. It is unprivileged so quite an easy way for random 
app to ruin GPU performance for everyone.

Regards,

Tvrtko


I'm pretty sure Dmitry wants dynamic configurations.

Dmitry?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-04 Thread Lionel Landwerlin

On 03/05/18 18:18, Tvrtko Ursulin wrote:

  +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
+  struct intel_engine_cs *engine,
+  struct i915_gem_context_sseu *sseu)
+{
+    struct drm_i915_private *dev_priv = ctx->i915;
+    struct intel_context *ce;
+    enum intel_engine_id id;
+    int ret;
+
+    lockdep_assert_held(_priv->drm.struct_mutex);
+
+    if (memcmp(sseu, >engine[engine->id].sseu, sizeof(*sseu)) 
== 0)

+    return 0;
+
+    /*
+ * We can only program this on render ring.
+ */
+    ce = >engine[RCS];
+
+    if (ce->pin_count) { /* Assume that the context is active! */
+    ret = i915_gem_switch_to_kernel_context(dev_priv);
+    if (ret)
+    return ret;
+
+    ret = i915_gem_wait_for_idle(dev_priv,
+ I915_WAIT_INTERRUPTIBLE |
+ I915_WAIT_LOCKED);


Could we consider the alternative of only allowing this to be 
configured on context create? That way we would not need to idle the 
GPU every time userspace decides to fiddle with it. It is unprivileged 
so quite an easy way for random app to ruin GPU performance for everyone.


Regards,

Tvrtko 



I'm pretty sure Dmitry wants dynamic configurations.

Dmitry?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-03 Thread Tvrtko Ursulin


On 25/04/2018 12:45, Lionel Landwerlin wrote:

From: Chris Wilson 

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

struct drm_i915_gem_context_param arg;
struct drm_i915_gem_context_param_sseu sseu = { .flags = 
I915_EXEC_RENDER };

memset(, 0, sizeof(arg));
arg.ctx_id = ctx;
arg.param = I915_CONTEXT_PARAM_SSEU;
arg.value = (uintptr_t) 
if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) {
sseu.packed.subslice_mask = 0;

drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, );
}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson 
Signed-off-by: Lionel Landwerlin 
c: Dmitry Rogozhkin 
CC: Tvrtko Ursulin 
CC: Zhipeng Gong 
CC: Joonas Lahtinen 
---
  drivers/gpu/drm/i915/i915_gem_context.c | 82 -
  drivers/gpu/drm/i915/intel_lrc.c| 55 +
  drivers/gpu/drm/i915/intel_lrc.h|  4 ++
  include/uapi/drm/i915_drm.h | 28 +
  4 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index bdf050beeb94..b97ddcf47514 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, 
void *data,
return 0;
  }
  
+static struct i915_gem_context_sseu

+i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+const struct 
drm_i915_gem_context_param_sseu *user_sseu)
+{
+   struct i915_gem_context_sseu value = {
+   .slice_mask = user_sseu->packed.slice_mask == 0 ?
+ sseu->slice_mask :
+ (user_sseu->packed.slice_mask & sseu->slice_mask),
+   .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
+sseu->subslice_mask[0] :
+(user_sseu->packed.subslice_mask & 
sseu->subslice_mask[0]),
+   .min_eus_per_subslice = 
min(user_sseu->packed.min_eus_per_subslice,
+   sseu->max_eus_per_subslice),
+   .max_eus_per_subslice = 
min(user_sseu->packed.max_eus_per_subslice,
+   sseu->max_eus_per_subslice),
+   };
+
+   return value;
+}
+
  int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
struct drm_file *file)
  {
@@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
*dev, void *data,
case I915_CONTEXT_PARAM_PRIORITY:
args->value = ctx->sched.priority;
break;
+   case I915_CONTEXT_PARAM_SSEU: {
+   struct drm_i915_gem_context_param_sseu param_sseu;
+   struct intel_engine_cs *engine;
+
+   if (copy_from_user(_sseu, u64_to_user_ptr(args->value),
+  sizeof(param_sseu))) {
+   ret = -EFAULT;
+   break;
+   }
+
+   engine = i915_gem_engine_from_flags(to_i915(dev), file,
+   param_sseu.flags);
+   if (!engine) {
+   ret = -EINVAL;
+   break;
+   }
+
+   param_sseu.packed.slice_mask =
+   ctx->engine[engine->id].sseu.slice_mask;
+   param_sseu.packed.subslice_mask =
+   

Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-03 Thread Tvrtko Ursulin


On 03/05/2018 17:04, Joonas Lahtinen wrote:

Quoting Lionel Landwerlin (2018-04-26 13:22:30)

On 26/04/18 11:00, Joonas Lahtinen wrote:

Quoting Lionel Landwerlin (2018-04-25 14:45:21)

From: Chris Wilson 

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

This hit a dead end last time due to the system wide policy needed to
avoid two parties fighting over the slice count (and going back and
forth between two slice counts would counter the benefits received from
this).

Do we now have a solution for the contention? I don't see code to
negotiate a global value, just raw setter.

Regards, Joonas


I've tried to come up with some numbers about the cost of the back &
forth (see igt series).

Other than that, I don't think we can expect the kernel to workaround
the inefficient use of the hardware by userspace.


Well, I'm pretty sure we should not try to make the situation too
miserable for the basic usecases.

If we allow two contexts to fight over the slice count, countering any
perceived benefit, I don't think that'll be a good default.

My recollection of the last round of discussion was that reasonable
thing to do would be to only disable slices when everyone is willing to
let go of. Then it would become a system maintainer level decision to
only run on two slices for when they see fit (configuring the userspace).


Yes there was definitely talk about co-ordination between OpenCL and 
media stacks.


Btw we mentioned on IRC a few days back that a sysfs knob like 
allow_media_priority, defaulting to 0, could be a good start.


That way userspace could use the uAPI, but unless sysadmin allows it to, 
there will be no fighting over slice counts.


Then at some future time, as long as the uAPI is sensible, more advanced 
policies could be explored. Regardless if in userspace or i915.


Regards,

Tvrtko


More advanced tactics would include scheduling work so that we try to
avoid the slice count changes and deduct the switching time from the
execution budget of the app requesting less slices (if we had fair
time slicing).

Regards, Joonas






We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

  struct drm_i915_gem_context_param arg;
  struct drm_i915_gem_context_param_sseu sseu = { .flags = 
I915_EXEC_RENDER };

  memset(, 0, sizeof(arg));
  arg.ctx_id = ctx;
  arg.param = I915_CONTEXT_PARAM_SSEU;
  arg.value = (uintptr_t) 
  if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) {
  sseu.packed.subslice_mask = 0;

  drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, );
  }

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson 
Signed-off-by: Lionel Landwerlin 
c: Dmitry Rogozhkin 
CC: Tvrtko Ursulin 
CC: Zhipeng Gong 
CC: Joonas Lahtinen 
---
   drivers/gpu/drm/i915/i915_gem_context.c | 82 -
   drivers/gpu/drm/i915/intel_lrc.c| 55 +
   drivers/gpu/drm/i915/intel_lrc.h|  4 ++
   include/uapi/drm/i915_drm.h | 28 +
   4 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index bdf050beeb94..b97ddcf47514 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, 
void *data,
  return 0;
   }
   
+static struct i915_gem_context_sseu

+i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+const struct 
drm_i915_gem_context_param_sseu *user_sseu)
+{
+   struct 

Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-03 Thread Lionel Landwerlin

On 03/05/18 17:04, Joonas Lahtinen wrote:

Quoting Lionel Landwerlin (2018-04-26 13:22:30)

On 26/04/18 11:00, Joonas Lahtinen wrote:

Quoting Lionel Landwerlin (2018-04-25 14:45:21)

From: Chris Wilson 

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

This hit a dead end last time due to the system wide policy needed to
avoid two parties fighting over the slice count (and going back and
forth between two slice counts would counter the benefits received from
this).

Do we now have a solution for the contention? I don't see code to
negotiate a global value, just raw setter.

Regards, Joonas

I've tried to come up with some numbers about the cost of the back &
forth (see igt series).

Other than that, I don't think we can expect the kernel to workaround
the inefficient use of the hardware by userspace.

Well, I'm pretty sure we should not try to make the situation too
miserable for the basic usecases.

If we allow two contexts to fight over the slice count, countering any
perceived benefit, I don't think that'll be a good default.

My recollection of the last round of discussion was that reasonable
thing to do would be to only disable slices when everyone is willing to
let go of. Then it would become a system maintainer level decision to
only run on two slices for when they see fit (configuring the userspace).


How would you detect that everybody is willing to let go?
We don't appear to have a mechanism to detect that.
Wouldn't that require to teach all userspace drivers?



More advanced tactics would include scheduling work so that we try to
avoid the slice count changes and deduct the switching time from the
execution budget of the app requesting less slices (if we had fair
time slicing).


That sounds more workable, although fairly complicated.
Maybe we could tweak the priority based on the slice count and say :
    for the next (1second / number of slice config), priority bumped 
for the configuration with x slices

    and rotate every (1second / number of slice config)

Would that resolve the back & forth issue completely though?

Moving a particular context back & forth between different 
configurations is costly too.
You need to drain the context, then pin the context before you can edit 
and resubmit.


Thanks for your feedback,

-
Lionel



Regards, Joonas



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-03 Thread Chris Wilson
Quoting Joonas Lahtinen (2018-05-03 17:04:31)
> More advanced tactics would include scheduling work so that we try to
> avoid the slice count changes and deduct the switching time from the
> execution budget of the app requesting less slices (if we had fair
> time slicing).

Have you been peeking at my reading material? ;)
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-03 Thread Joonas Lahtinen
Quoting Lionel Landwerlin (2018-04-26 13:22:30)
> On 26/04/18 11:00, Joonas Lahtinen wrote:
> > Quoting Lionel Landwerlin (2018-04-25 14:45:21)
> >> From: Chris Wilson 
> >>
> >> We want to allow userspace to reconfigure the subslice configuration for
> >> its own use case. To do so, we expose a context parameter to allow
> >> adjustment of the RPCS register stored within the context image (and
> >> currently not accessible via LRI). If the context is adjusted before
> >> first use, the adjustment is for "free"; otherwise if the context is
> >> active we flush the context off the GPU (stalling all users) and forcing
> >> the GPU to save the context to memory where we can modify it and so
> >> ensure that the register is reloaded on next execution.
> >>
> >> The overhead of managing additional EU subslices can be significant,
> >> especially in multi-context workloads. Non-GPGPU contexts should
> >> preferably disable the subslices it is not using, and others should
> >> fine-tune the number to match their workload.
> > This hit a dead end last time due to the system wide policy needed to
> > avoid two parties fighting over the slice count (and going back and
> > forth between two slice counts would counter the benefits received from
> > this).
> >
> > Do we now have a solution for the contention? I don't see code to
> > negotiate a global value, just raw setter.
> >
> > Regards, Joonas
> 
> I've tried to come up with some numbers about the cost of the back & 
> forth (see igt series).
> 
> Other than that, I don't think we can expect the kernel to workaround 
> the inefficient use of the hardware by userspace.

Well, I'm pretty sure we should not try to make the situation too
miserable for the basic usecases.

If we allow two contexts to fight over the slice count, countering any
perceived benefit, I don't think that'll be a good default.

My recollection of the last round of discussion was that reasonable
thing to do would be to only disable slices when everyone is willing to
let go of. Then it would become a system maintainer level decision to
only run on two slices for when they see fit (configuring the userspace).

More advanced tactics would include scheduling work so that we try to
avoid the slice count changes and deduct the switching time from the
execution budget of the app requesting less slices (if we had fair
time slicing).

Regards, Joonas

> 
> >
> >> We expose complete control over the RPCS register, allowing
> >> configuration of slice/subslice, via masks packed into a u64 for
> >> simplicity. For example,
> >>
> >>  struct drm_i915_gem_context_param arg;
> >>  struct drm_i915_gem_context_param_sseu sseu = { .flags = 
> >> I915_EXEC_RENDER };
> >>
> >>  memset(, 0, sizeof(arg));
> >>  arg.ctx_id = ctx;
> >>  arg.param = I915_CONTEXT_PARAM_SSEU;
> >>  arg.value = (uintptr_t) 
> >>  if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) 
> >> {
> >>  sseu.packed.subslice_mask = 0;
> >>
> >>  drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, );
> >>  }
> >>
> >> could be used to disable all subslices where supported.
> >>
> >> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
> >> (Lionel)
> >>
> >> v3: Add ability to program this per engine (Chris)
> >>
> >> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> >>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> >> Signed-off-by: Chris Wilson 
> >> Signed-off-by: Lionel Landwerlin 
> >> c: Dmitry Rogozhkin 
> >> CC: Tvrtko Ursulin 
> >> CC: Zhipeng Gong 
> >> CC: Joonas Lahtinen 
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem_context.c | 82 -
> >>   drivers/gpu/drm/i915/intel_lrc.c| 55 +
> >>   drivers/gpu/drm/i915/intel_lrc.h|  4 ++
> >>   include/uapi/drm/i915_drm.h | 28 +
> >>   4 files changed, 168 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
> >> b/drivers/gpu/drm/i915/i915_gem_context.c
> >> index bdf050beeb94..b97ddcf47514 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device 
> >> *dev, void *data,
> >>  return 0;
> >>   }
> >>   
> >> +static struct i915_gem_context_sseu
> >> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> >> +const struct 
> >> drm_i915_gem_context_param_sseu *user_sseu)
> >> +{
> >> +   struct i915_gem_context_sseu value = {
> >> +   .slice_mask = user_sseu->packed.slice_mask == 0 ?
> >> + sseu->slice_mask :
> 

Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-04-26 Thread Lionel Landwerlin

On 26/04/18 11:00, Joonas Lahtinen wrote:

Quoting Lionel Landwerlin (2018-04-25 14:45:21)

From: Chris Wilson 

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

This hit a dead end last time due to the system wide policy needed to
avoid two parties fighting over the slice count (and going back and
forth between two slice counts would counter the benefits received from
this).

Do we now have a solution for the contention? I don't see code to
negotiate a global value, just raw setter.

Regards, Joonas


I've tried to come up with some numbers about the cost of the back & 
forth (see igt series).


Other than that, I don't think we can expect the kernel to workaround 
the inefficient use of the hardware by userspace.





We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

 struct drm_i915_gem_context_param arg;
 struct drm_i915_gem_context_param_sseu sseu = { .flags = 
I915_EXEC_RENDER };

 memset(, 0, sizeof(arg));
 arg.ctx_id = ctx;
 arg.param = I915_CONTEXT_PARAM_SSEU;
 arg.value = (uintptr_t) 
 if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) {
 sseu.packed.subslice_mask = 0;

 drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, );
 }

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson 
Signed-off-by: Lionel Landwerlin 
c: Dmitry Rogozhkin 
CC: Tvrtko Ursulin 
CC: Zhipeng Gong 
CC: Joonas Lahtinen 
---
  drivers/gpu/drm/i915/i915_gem_context.c | 82 -
  drivers/gpu/drm/i915/intel_lrc.c| 55 +
  drivers/gpu/drm/i915/intel_lrc.h|  4 ++
  include/uapi/drm/i915_drm.h | 28 +
  4 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index bdf050beeb94..b97ddcf47514 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, 
void *data,
 return 0;
  }
  
+static struct i915_gem_context_sseu

+i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+const struct 
drm_i915_gem_context_param_sseu *user_sseu)
+{
+   struct i915_gem_context_sseu value = {
+   .slice_mask = user_sseu->packed.slice_mask == 0 ?
+ sseu->slice_mask :
+ (user_sseu->packed.slice_mask & sseu->slice_mask),
+   .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
+sseu->subslice_mask[0] :
+(user_sseu->packed.subslice_mask & 
sseu->subslice_mask[0]),
+   .min_eus_per_subslice = 
min(user_sseu->packed.min_eus_per_subslice,
+   sseu->max_eus_per_subslice),
+   .max_eus_per_subslice = 
min(user_sseu->packed.max_eus_per_subslice,
+   sseu->max_eus_per_subslice),
+   };
+
+   return value;
+}
+
  int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file)
  {
@@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
*dev, void *data,
 case I915_CONTEXT_PARAM_PRIORITY:
 args->value = ctx->sched.priority;
 break;
+   case I915_CONTEXT_PARAM_SSEU: {
+   struct drm_i915_gem_context_param_sseu param_sseu;
+   struct intel_engine_cs *engine;
+
+   if (copy_from_user(_sseu, 

Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-04-26 Thread Joonas Lahtinen
Quoting Lionel Landwerlin (2018-04-25 14:45:21)
> From: Chris Wilson 
> 
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI). If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.
> 
> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.

This hit a dead end last time due to the system wide policy needed to
avoid two parties fighting over the slice count (and going back and
forth between two slice counts would counter the benefits received from
this).

Do we now have a solution for the contention? I don't see code to
negotiate a global value, just raw setter.

Regards, Joonas

> 
> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
> 
> struct drm_i915_gem_context_param arg;
> struct drm_i915_gem_context_param_sseu sseu = { .flags = 
> I915_EXEC_RENDER };
> 
> memset(, 0, sizeof(arg));
> arg.ctx_id = ctx;
> arg.param = I915_CONTEXT_PARAM_SSEU;
> arg.value = (uintptr_t) 
> if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, ) == 0) {
> sseu.packed.subslice_mask = 0;
> 
> drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, );
> }
> 
> could be used to disable all subslices where supported.
> 
> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> 
> v3: Add ability to program this per engine (Chris)
> 
> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Signed-off-by: Chris Wilson 
> Signed-off-by: Lionel Landwerlin 
> c: Dmitry Rogozhkin 
> CC: Tvrtko Ursulin 
> CC: Zhipeng Gong 
> CC: Joonas Lahtinen 
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 82 -
>  drivers/gpu/drm/i915/intel_lrc.c| 55 +
>  drivers/gpu/drm/i915/intel_lrc.h|  4 ++
>  include/uapi/drm/i915_drm.h | 28 +
>  4 files changed, 168 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index bdf050beeb94..b97ddcf47514 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device 
> *dev, void *data,
> return 0;
>  }
>  
> +static struct i915_gem_context_sseu
> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> +const struct 
> drm_i915_gem_context_param_sseu *user_sseu)
> +{
> +   struct i915_gem_context_sseu value = {
> +   .slice_mask = user_sseu->packed.slice_mask == 0 ?
> + sseu->slice_mask :
> + (user_sseu->packed.slice_mask & 
> sseu->slice_mask),
> +   .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
> +sseu->subslice_mask[0] :
> +(user_sseu->packed.subslice_mask & 
> sseu->subslice_mask[0]),
> +   .min_eus_per_subslice = 
> min(user_sseu->packed.min_eus_per_subslice,
> +   sseu->max_eus_per_subslice),
> +   .max_eus_per_subslice = 
> min(user_sseu->packed.max_eus_per_subslice,
> +   sseu->max_eus_per_subslice),
> +   };
> +
> +   return value;
> +}
> +
>  int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file)
>  {
> @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
> *dev, void *data,
> case I915_CONTEXT_PARAM_PRIORITY:
> args->value = ctx->sched.priority;
> break;
> +   case I915_CONTEXT_PARAM_SSEU: {
> +   struct drm_i915_gem_context_param_sseu param_sseu;
> +   struct intel_engine_cs *engine;
> +
> +   if (copy_from_user(_sseu, u64_to_user_ptr(args->value),
> +  sizeof(param_sseu))) {
> +