Re: [Intel-gfx] [PATCH v3 4/7] drm/i915/perf: Add OA unit support for Gen 8+

2017-04-12 Thread Robert Bragg
On Wed, Apr 12, 2017 at 12:33 PM, Matthew Auld  wrote:

> On 04/05, Robert Bragg wrote:
> > Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
> > share (more-or-less) the same OA unit design.
> >
> > Of particular note in comparison to Haswell: some OA unit HW config
> > state has become per-context state and as a consequence it is somewhat
> > more complicated to manage synchronous state changes from the cpu while
> > there's no guarantee of what context (if any) is currently actively
> > running on the gpu.
> >
> > The periodic sampling frequency which can be particularly useful for
> > system-wide analysis (as opposed to command stream synchronised
> > MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
> > have become per-context save and restored (while the OABUFFER
> > destination is still a shared, system-wide resource).
> >
> > This support for gen8+ takes care to consider a number of timing
> > challenges involved in synchronously updating per-context state
> > primarily by programming all config state from the cpu and updating all
> > current and saved contexts synchronously while the OA unit is still
> > disabled.
> >
> > The driver intentionally avoids depending on command streamer
> > programming to update OA state considering the lack of synchronization
> > between the automatic loading of OACTXCONTROL state (that includes the
> > periodic sampling state and enable state) on context restore and the
> > parsing of any general purpose BB the driver can control. I.e. this
> > implementation is careful to avoid the possibility of a context restore
> > temporarily enabling any out-of-date periodic sampling state. In
> > addition to the risk of transiently-out-of-date state being loaded
> > automatically; there are also internal HW latencies involved in the
> > loading of MUX configurations which would be difficult to account for
> > from the command streamer (and we only want to enable the unit when once
> > the MUX configuration is complete).
> >
> > Since the Gen8+ OA unit design no longer supports clock gating the unit
> > off for a single given context (which effectively stopped any progress
> > of counters while any other context was running) and instead supports
> > tagging OA reports with a context ID for filtering on the CPU, it means
> > we can no longer hide the system-wide progress of counters from a
> > non-privileged application only interested in metrics for its own
> > context. Although we could theoretically try and subtract the progress
> > of other contexts before forwarding reports via read() we aren't in a
> > position to filter reports captured via MI_REPORT_PERF_COUNT commands.
> > As a result, for Gen8+, we always require the
> > dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
> > if not root.
> >
> > Signed-off-by: Robert Bragg 
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h |  45 +-
> >  drivers/gpu/drm/i915/i915_gem_context.h |   1 +
> >  drivers/gpu/drm/i915/i915_perf.c| 938
> +---
> >  drivers/gpu/drm/i915/i915_reg.h |  22 +
> >  drivers/gpu/drm/i915/intel_lrc.c|   5 +
> >  include/uapi/drm/i915_drm.h |  19 +-
> >  6 files changed, 937 insertions(+), 93 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> > index 9c37b73ac7ac..3a22b6fd0ee6 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2067,9 +2067,17 @@ struct i915_oa_ops {
> >   void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
> >
> >   /**
> > -  * @enable_metric_set: Applies any MUX configuration to set up the
> > -  * Boolean and Custom (B/C) counters that are part of the counter
> > -  * reports being sampled. May apply system constraints such as
> > +  * @select_metric_set: The auto generated code that checks whether
> a
> > +  * requested OA config is applicable to the system and if so sets
> up
> > +  * the mux, oa and flex eu register config pointers according to
> the
> > +  * current dev_priv->perf.oa.metrics_set.
> > +  */
> > + int (*select_metric_set)(struct drm_i915_private *dev_priv);
> > +
> > + /**
> > +  * @enable_metric_set: Selects and applies any MUX configuration
> to set
> > +  * up the Boolean and Custom (B/C) counters that are part of the
> > +  * counter reports being sampled. May apply system constraints
> such as
> >* disabling EU clock gating as required.
> >*/
> >   int (*enable_metric_set)(struct drm_i915_private *dev_priv);
> > @@ -2100,20 +2108,13 @@ struct i915_oa_ops {
> >   size_t *offset);
> >
> >   /**
> > -  * @oa_buffer_check: Check for OA buffer data + update tail
> > -  *
> > -  * This is either called via fops or the poll check hrtimer (atomic
> > -  * ctx) without 

Re: [Intel-gfx] [PATCH v3 4/7] drm/i915/perf: Add OA unit support for Gen 8+

2017-04-12 Thread Matthew Auld
On 04/05, Robert Bragg wrote:
> Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
> share (more-or-less) the same OA unit design.
> 
> Of particular note in comparison to Haswell: some OA unit HW config
> state has become per-context state and as a consequence it is somewhat
> more complicated to manage synchronous state changes from the cpu while
> there's no guarantee of what context (if any) is currently actively
> running on the gpu.
> 
> The periodic sampling frequency which can be particularly useful for
> system-wide analysis (as opposed to command stream synchronised
> MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
> have become per-context save and restored (while the OABUFFER
> destination is still a shared, system-wide resource).
> 
> This support for gen8+ takes care to consider a number of timing
> challenges involved in synchronously updating per-context state
> primarily by programming all config state from the cpu and updating all
> current and saved contexts synchronously while the OA unit is still
> disabled.
> 
> The driver intentionally avoids depending on command streamer
> programming to update OA state considering the lack of synchronization
> between the automatic loading of OACTXCONTROL state (that includes the
> periodic sampling state and enable state) on context restore and the
> parsing of any general purpose BB the driver can control. I.e. this
> implementation is careful to avoid the possibility of a context restore
> temporarily enabling any out-of-date periodic sampling state. In
> addition to the risk of transiently-out-of-date state being loaded
> automatically; there are also internal HW latencies involved in the
> loading of MUX configurations which would be difficult to account for
> from the command streamer (and we only want to enable the unit when once
> the MUX configuration is complete).
> 
> Since the Gen8+ OA unit design no longer supports clock gating the unit
> off for a single given context (which effectively stopped any progress
> of counters while any other context was running) and instead supports
> tagging OA reports with a context ID for filtering on the CPU, it means
> we can no longer hide the system-wide progress of counters from a
> non-privileged application only interested in metrics for its own
> context. Although we could theoretically try and subtract the progress
> of other contexts before forwarding reports via read() we aren't in a
> position to filter reports captured via MI_REPORT_PERF_COUNT commands.
> As a result, for Gen8+, we always require the
> dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
> if not root.
> 
> Signed-off-by: Robert Bragg 
> ---
>  drivers/gpu/drm/i915/i915_drv.h |  45 +-
>  drivers/gpu/drm/i915/i915_gem_context.h |   1 +
>  drivers/gpu/drm/i915/i915_perf.c| 938 
> +---
>  drivers/gpu/drm/i915/i915_reg.h |  22 +
>  drivers/gpu/drm/i915/intel_lrc.c|   5 +
>  include/uapi/drm/i915_drm.h |  19 +-
>  6 files changed, 937 insertions(+), 93 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 9c37b73ac7ac..3a22b6fd0ee6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2067,9 +2067,17 @@ struct i915_oa_ops {
>   void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
>  
>   /**
> -  * @enable_metric_set: Applies any MUX configuration to set up the
> -  * Boolean and Custom (B/C) counters that are part of the counter
> -  * reports being sampled. May apply system constraints such as
> +  * @select_metric_set: The auto generated code that checks whether a
> +  * requested OA config is applicable to the system and if so sets up
> +  * the mux, oa and flex eu register config pointers according to the
> +  * current dev_priv->perf.oa.metrics_set.
> +  */
> + int (*select_metric_set)(struct drm_i915_private *dev_priv);
> +
> + /**
> +  * @enable_metric_set: Selects and applies any MUX configuration to set
> +  * up the Boolean and Custom (B/C) counters that are part of the
> +  * counter reports being sampled. May apply system constraints such as
>* disabling EU clock gating as required.
>*/
>   int (*enable_metric_set)(struct drm_i915_private *dev_priv);
> @@ -2100,20 +2108,13 @@ struct i915_oa_ops {
>   size_t *offset);
>  
>   /**
> -  * @oa_buffer_check: Check for OA buffer data + update tail
> -  *
> -  * This is either called via fops or the poll check hrtimer (atomic
> -  * ctx) without any locks taken.
> +  * @oa_hw_tail_read: read the OA tail pointer register
>*
> -  * It's safe to read OA config state here unlocked, assuming that this
> -  * is only called while the stream is enabled, while the global OA
> -  * configuration 

Re: [Intel-gfx] [PATCH v3 4/7] drm/i915/perf: Add OA unit support for Gen 8+

2017-04-06 Thread Lionel Landwerlin

On 05/04/17 17:23, Robert Bragg wrote:

Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.

Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.

The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).

This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.

The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).

Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.

Signed-off-by: Robert Bragg 
---
  drivers/gpu/drm/i915/i915_drv.h |  45 +-
  drivers/gpu/drm/i915/i915_gem_context.h |   1 +
  drivers/gpu/drm/i915/i915_perf.c| 938 +---
  drivers/gpu/drm/i915/i915_reg.h |  22 +
  drivers/gpu/drm/i915/intel_lrc.c|   5 +
  include/uapi/drm/i915_drm.h |  19 +-
  6 files changed, 937 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9c37b73ac7ac..3a22b6fd0ee6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2067,9 +2067,17 @@ struct i915_oa_ops {
void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
  
  	/**

-* @enable_metric_set: Applies any MUX configuration to set up the
-* Boolean and Custom (B/C) counters that are part of the counter
-* reports being sampled. May apply system constraints such as
+* @select_metric_set: The auto generated code that checks whether a
+* requested OA config is applicable to the system and if so sets up
+* the mux, oa and flex eu register config pointers according to the
+* current dev_priv->perf.oa.metrics_set.
+*/
+   int (*select_metric_set)(struct drm_i915_private *dev_priv);
+
+   /**
+* @enable_metric_set: Selects and applies any MUX configuration to set
+* up the Boolean and Custom (B/C) counters that are part of the
+* counter reports being sampled. May apply system constraints such as
 * disabling EU clock gating as required.
 */
int (*enable_metric_set)(struct drm_i915_private *dev_priv);
@@ -2100,20 +2108,13 @@ struct i915_oa_ops {
size_t *offset);
  
  	/**

-* @oa_buffer_check: Check for OA buffer data + update tail
-*
-* This is either called via fops or the poll check hrtimer (atomic
-* ctx) without any locks taken.
+* @oa_hw_tail_read: read the OA tail pointer register
 *
-* It's safe to read OA config state here unlocked, assuming that this
-* is only called while the stream is enabled, while the global OA
-* configuration can't be modified.
-*
-* Efficiency is more important than avoiding some false positives
-* here, 

[Intel-gfx] [PATCH v3 4/7] drm/i915/perf: Add OA unit support for Gen 8+

2017-04-05 Thread Robert Bragg
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.

Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.

The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).

This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.

The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).

Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.

Signed-off-by: Robert Bragg 
---
 drivers/gpu/drm/i915/i915_drv.h |  45 +-
 drivers/gpu/drm/i915/i915_gem_context.h |   1 +
 drivers/gpu/drm/i915/i915_perf.c| 938 +---
 drivers/gpu/drm/i915/i915_reg.h |  22 +
 drivers/gpu/drm/i915/intel_lrc.c|   5 +
 include/uapi/drm/i915_drm.h |  19 +-
 6 files changed, 937 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9c37b73ac7ac..3a22b6fd0ee6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2067,9 +2067,17 @@ struct i915_oa_ops {
void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
 
/**
-* @enable_metric_set: Applies any MUX configuration to set up the
-* Boolean and Custom (B/C) counters that are part of the counter
-* reports being sampled. May apply system constraints such as
+* @select_metric_set: The auto generated code that checks whether a
+* requested OA config is applicable to the system and if so sets up
+* the mux, oa and flex eu register config pointers according to the
+* current dev_priv->perf.oa.metrics_set.
+*/
+   int (*select_metric_set)(struct drm_i915_private *dev_priv);
+
+   /**
+* @enable_metric_set: Selects and applies any MUX configuration to set
+* up the Boolean and Custom (B/C) counters that are part of the
+* counter reports being sampled. May apply system constraints such as
 * disabling EU clock gating as required.
 */
int (*enable_metric_set)(struct drm_i915_private *dev_priv);
@@ -2100,20 +2108,13 @@ struct i915_oa_ops {
size_t *offset);
 
/**
-* @oa_buffer_check: Check for OA buffer data + update tail
-*
-* This is either called via fops or the poll check hrtimer (atomic
-* ctx) without any locks taken.
+* @oa_hw_tail_read: read the OA tail pointer register
 *
-* It's safe to read OA config state here unlocked, assuming that this
-* is only called while the stream is enabled, while the global OA
-* configuration can't be modified.
-*
-* Efficiency is more important than avoiding some false positives
-* here, which will be handled gracefully - likely