Re: [Intel-gfx] [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events

2017-12-06 Thread Robert Bragg
On Thu, Dec 7, 2017 at 12:48 AM, Robert Bragg <rob...@sixbynine.org> wrote:

>
> at least from what I wrote back then it looks like I was seeing a drift of
> a few milliseconds per second on SKL. I vaguely recall it being much worse
> given the frequency constants we had for Haswell.
>

Sorry I didn't actually re-read my own message properly before referencing
it :) Apparently the 2ms per second drift was for Haswell, so presumably
not quite so bad for SKL.

- Robert
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events

2017-12-06 Thread Robert Bragg
On Wed, Nov 15, 2017 at 12:13 PM, Sagar Arun Kamble <
sagar.a.kam...@intel.com> wrote:

> We can compute system time corresponding to GPU timestamp by taking a
> reference point (CPU monotonic time, GPU timestamp) and then adding
> delta time computed using timecounter/cyclecounter support in kernel.
> We have to configure cyclecounter with the GPU timestamp frequency.
> Earlier approach that was based on cross-timestamp is not needed. It
> was being used to approximate the frequency based on invalid assumptions
> (possibly drift was being seen in the time due to precision issue).
> The precision of time from GPU clocks is already in ns and timecounter
> takes care of it as verified over variable durations.
>

Hi Sagar,

I have some doubts about this analysis...

The intent behind Sourab's original approach was to be able to determine
the frequency at runtime empirically because the constants we have aren't
particularly accurate. Without a perfectly stable frequency that's known
very precisely then an interpolated correlation will inevitably drift. I
think the nature of HW implies we can't expect to have either of those.
Then the general idea had been to try and use existing kernel
infrastructure for a problem which isn't unique to GPU clocks.

That's not to say that a more limited, simpler solution based on frequent
re-correlation wouldn't be more than welcome if tracking an accurate
frequency is too awkward for now, but I think some things need to be
considered in that case:

- It would be good to quantify the kind of drift seen in practice to know
how frequently it's necessary to re-synchronize. It sounds like you've done
this ("as verified over variable durations") so I'm curious what kind of
drift you saw. I'd imagine you would see a significant drift over, say, one
second and it might not take much longer for the drift to even become
clearly visible to the user when plotted in a UI. For reference I once
updated the arb_timer_query test in piglit to give some insight into this
drift (
https://lists.freedesktop.org/archives/piglit/2016-September/020673.html)
and at least from what I wrote back then it looks like I was seeing a drift
of a few milliseconds per second on SKL. I vaguely recall it being much
worse given the frequency constants we had for Haswell.

- What guarantees will be promised about monotonicity of correlated system
timestamps? Will it be guaranteed that sequential reports must have
monotonically increasing timestamps? That might be fiddly if the gpu +
system clock are periodically re-correlated, so it might be good to be
clear in documentation that the correlation is best-effort only for the
sake of implementation simplicity. That would still be good for a lot of
UIs I think and there's freedom for the driver to start simple and
potentially improve later by measuring the gpu clock frequency empirically.

Currently only one correlated pair of timestamps is read when enabling the
stream and so a relatively long time is likely to pass before the stream is
disabled (seconds, minutes while a user is running a system profiler) . It
seems very likely to me that these clocks are going to drift significantly
without introducing some form of periodic re-synchronization based on some
understanding of the drift that's seen.

Br,
- Robert



> This series adds base timecounter/cyclecounter changes and changes to
> get GPU and CPU timestamps in OA samples.
>
> Sagar Arun Kamble (1):
>   drm/i915/perf: Add support to correlate GPU timestamp with system time
>
> Sourab Gupta (3):
>   drm/i915/perf: Add support for collecting 64 bit timestamps with OA
> reports
>   drm/i915/perf: Extract raw GPU timestamps from OA reports
>   drm/i915/perf: Send system clock monotonic time in perf samples
>
>  drivers/gpu/drm/i915/i915_drv.h  |  11 
>  drivers/gpu/drm/i915/i915_perf.c | 124 ++
> -
>  drivers/gpu/drm/i915/i915_reg.h  |   6 ++
>  include/uapi/drm/i915_drm.h  |  14 +
>  4 files changed, 154 insertions(+), 1 deletion(-)
>
> --
> 1.9.1
>
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events

2017-12-05 Thread Robert Bragg
On Tue, Dec 5, 2017 at 2:16 PM, Lionel Landwerlin <
lionel.g.landwer...@intel.com> wrote:

> Hey Sagar,
>
> Sorry for the delay looking into this series.
> I've done some userspace/UI work in GPUTop to try to correlate perf
> samples/tracepoints with i915 perf reports.
>
> I wanted to avoid having to add too much logic into the kernel and tried
> to sample both cpu clocks & gpu timestamps from userspace.
> So far that's not working. People more knowledgable than I would have
> realized that the kernel can sneak in work into syscalls.
> So result is that 2 syscalls (one to get the cpu clock, one for the gpu
> timestamp) back to back from the same thread leads to time differences of
> anywhere from a few microseconds to in some cases close to 1millisecond. So
> it's basically unworkable.
> Anyway the UI work won't go to waste :)
>
> I'm thinking to go with your approach.
> From my experiment with gputop, it seems we might want to use a different
> cpu clock source though or make it configurable.
> The perf infrastructure allows you to choose what clock you want to use.
> Since we want to avoid time adjustments on that clock (because we're adding
> deltas), a clock monotonic raw would make most sense.
>

I would guess the most generally useful clock domain to correlate with the
largest number of interesting events would surely be CLOCK_MONOTONIC, not
_MONOTONIC_RAW.

E.g. here's some discussion around why vblank events use CLOCK_MONOTINIC:
https://lists.freedesktop.org/archives/dri-devel/2012-October/028878.html

Br,
- Robert


> I'll look at adding some tests for this too.
>
> Thanks,
>
> -
> Lionel
>
> On 15/11/17 12:13, Sagar Arun Kamble wrote:
>
>> We can compute system time corresponding to GPU timestamp by taking a
>> reference point (CPU monotonic time, GPU timestamp) and then adding
>> delta time computed using timecounter/cyclecounter support in kernel.
>> We have to configure cyclecounter with the GPU timestamp frequency.
>> Earlier approach that was based on cross-timestamp is not needed. It
>> was being used to approximate the frequency based on invalid assumptions
>> (possibly drift was being seen in the time due to precision issue).
>> The precision of time from GPU clocks is already in ns and timecounter
>> takes care of it as verified over variable durations.
>>
>> This series adds base timecounter/cyclecounter changes and changes to
>> get GPU and CPU timestamps in OA samples.
>>
>> Sagar Arun Kamble (1):
>>drm/i915/perf: Add support to correlate GPU timestamp with system time
>>
>> Sourab Gupta (3):
>>drm/i915/perf: Add support for collecting 64 bit timestamps with OA
>>  reports
>>drm/i915/perf: Extract raw GPU timestamps from OA reports
>>drm/i915/perf: Send system clock monotonic time in perf samples
>>
>>   drivers/gpu/drm/i915/i915_drv.h  |  11 
>>   drivers/gpu/drm/i915/i915_perf.c | 124 ++
>> -
>>   drivers/gpu/drm/i915/i915_reg.h  |   6 ++
>>   include/uapi/drm/i915_drm.h  |  14 +
>>   4 files changed, 154 insertions(+), 1 deletion(-)
>>
>>
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 15/15] drm/i915/perf: remove perf.hook_lock

2017-04-12 Thread Robert Bragg
In earlier iterations of the i915-perf driver we had a number of
callbacks/hooks from other parts of the i915 driver to e.g. notify us
when a legacy context was pinned and these could run asynchronously with
respect to the stream file operations and might also run in atomic
context.

dev_priv->perf.hook_lock had been for serialising access to state needed
within these callbacks, but as the code has evolved some of the hooks
have gone away or are implemented to avoid needing to lock any state.

The remaining use of this lock was actually redundant considering how
the gen7 oacontrol state used to be updated as part of a context pin
hook.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  2 --
 drivers/gpu/drm/i915/i915_perf.c | 32 ++--
 2 files changed, 10 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 59dcce3b40a9..94c1f5331daf 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2438,8 +2438,6 @@ struct drm_i915_private {
struct mutex lock;
struct list_head streams;
 
-   spinlock_t hook_lock;
-
struct {
struct i915_perf_stream *exclusive_stream;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 5de8d57e0b77..1f25a6690f61 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1690,9 +1690,17 @@ static void gen8_disable_metric_set(struct 
drm_i915_private *dev_priv)
/* NOP */
 }
 
-static void gen7_update_oacontrol_locked(struct drm_i915_private *dev_priv)
+static void gen7_oa_enable(struct drm_i915_private *dev_priv)
 {
-   lockdep_assert_held(_priv->perf.hook_lock);
+   /* Reset buf pointers so we don't forward reports from before now.
+*
+* Think carefully if considering trying to avoid this, since it
+* also ensures status flags and the buffer itself are cleared
+* in error paths, and we have checks for invalid reports based
+* on the assumption that certain fields are written to zeroed
+* memory which this helps maintains.
+*/
+   gen7_init_oa_buffer(dev_priv);
 
if (dev_priv->perf.oa.exclusive_stream->enabled) {
struct i915_gem_context *ctx =
@@ -1715,25 +1723,6 @@ static void gen7_update_oacontrol_locked(struct 
drm_i915_private *dev_priv)
I915_WRITE(GEN7_OACONTROL, 0);
 }
 
-static void gen7_oa_enable(struct drm_i915_private *dev_priv)
-{
-   unsigned long flags;
-
-   /* Reset buf pointers so we don't forward reports from before now.
-*
-* Think carefully if considering trying to avoid this, since it
-* also ensures status flags and the buffer itself are cleared
-* in error paths, and we have checks for invalid reports based
-* on the assumption that certain fields are written to zeroed
-* memory which this helps maintains.
-*/
-   gen7_init_oa_buffer(dev_priv);
-
-   spin_lock_irqsave(_priv->perf.hook_lock, flags);
-   gen7_update_oacontrol_locked(dev_priv);
-   spin_unlock_irqrestore(_priv->perf.hook_lock, flags);
-}
-
 static void gen8_oa_enable(struct drm_i915_private *dev_priv)
 {
u32 report_format = dev_priv->perf.oa.oa_buffer.format;
@@ -3014,7 +3003,6 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 
INIT_LIST_HEAD(_priv->perf.streams);
mutex_init(_priv->perf.lock);
-   spin_lock_init(_priv->perf.hook_lock);
spin_lock_init(_priv->perf.oa.oa_buffer.ptr_lock);
 
oa_sample_rate_hard_limit =
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 14/15] drm/i915/perf: per-gen timebase for checking sample freq

2017-04-12 Thread Robert Bragg
An oa_exponent_to_ns() utility and per-gen timebase constants where
recently removed when updating the tail pointer race condition WA, and
this restores those so we can update the _PROP_OA_EXPONENT validation
done in read_properties_unlocked() to not assume we have a 12.5MHz
timebase as we did for Haswell.

Accordingly the oa_sample_rate_hard_limit value that's referenced by
proc_dointvec_minmax defining the absolute limit for the OA sampling
frequency is now initialized to (timestamp_frequency / 2) instead of the
6.25MHz constant for Haswell.

v2:
Specify frequency of 19.2MHz for BXT (Ville)
Initialize oa_sample_rate_hard_limit per-gen too (Lionel)

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Cc: Lionel Landwerlin <lionel.g.landwer...@linux.intel.com>
Cc: Ville Syrjälä <ville.syrj...@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/i915_perf.c | 37 ++---
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b8dcf281db53..59dcce3b40a9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2457,6 +2457,7 @@ struct drm_i915_private {
 
bool periodic;
int period_exponent;
+   int timestamp_frequency;
 
int metrics_set;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 611f996bece7..5de8d57e0b77 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -288,10 +288,12 @@ static u32 i915_perf_stream_paranoid = true;
 
 /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
  *
- * 160ns is the smallest sampling period we can theoretically program the OA
- * unit with on Haswell, corresponding to 6.25MHz.
+ * The highest sampling frequency we can theoretically program the OA unit
+ * with is always half the timestamp frequency: E.g. 6.25Mhz for Haswell.
+ *
+ * Initialized just before we register the sysctl parameter.
  */
-static int oa_sample_rate_hard_limit = 625;
+static int oa_sample_rate_hard_limit;
 
 /* Theoretically we can program the OA unit to sample every 160ns but don't
  * allow that by default unless root...
@@ -2560,6 +2562,12 @@ i915_perf_open_ioctl_locked(struct drm_i915_private 
*dev_priv,
return ret;
 }
 
+static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int exponent)
+{
+   return div_u64(10ULL * (2ULL << exponent),
+  dev_priv->perf.oa.timestamp_frequency);
+}
+
 /**
  * read_properties_unlocked - validate + copy userspace stream open properties
  * @dev_priv: i915 device instance
@@ -2656,16 +2664,13 @@ static int read_properties_unlocked(struct 
drm_i915_private *dev_priv,
}
 
/* Theoretically we can program the OA unit to sample
-* every 160ns but don't allow that by default unless
-* root.
-*
-* On Haswell the period is derived from the exponent
-* as:
-*
-*   period = 80ns * 2^(exponent + 1)
+* e.g. every 160ns for HSW, 167ns for BDW/SKL or 104ns
+* for BXT. We don't allow such high sampling
+* frequencies by default unless root.
 */
+
BUILD_BUG_ON(sizeof(oa_period) != 8);
-   oa_period = 80ull * (2ull << value);
+   oa_period = oa_exponent_to_ns(dev_priv, value);
 
/* This check is primarily to ensure that oa_period <=
 * UINT32_MAX (before passing to do_div which only
@@ -2921,6 +2926,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
dev_priv->perf.oa.ops.oa_hw_tail_read =
gen7_oa_hw_tail_read;
 
+   dev_priv->perf.oa.timestamp_frequency = 1250;
+
dev_priv->perf.oa.oa_formats = hsw_oa_formats;
 
dev_priv->perf.oa.n_builtin_sets =
@@ -2934,6 +2941,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 */
 
if (IS_GEN8(dev_priv)) {
+   dev_priv->perf.oa.timestamp_frequency = 1250;
+
dev_priv->perf.oa.ctx_oactxctrl_offset = 0x120;
dev_priv->perf.oa.ctx_flexeu0_offset = 0x2ce;
dev_priv->perf.oa.gen8_valid_ctx_bit = (1<<25);
@@ -2950,6 +2959,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
i

[Intel-gfx] [PATCH v4 12/15] drm/i915/perf: Add OA unit support for Gen 8+

2017-04-12 Thread Robert Bragg
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.

Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.

The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).

This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.

The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).

Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Acked-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |  45 +-
 drivers/gpu/drm/i915/i915_gem_context.h |   1 +
 drivers/gpu/drm/i915/i915_perf.c| 949 +---
 drivers/gpu/drm/i915/i915_reg.h |  22 +
 drivers/gpu/drm/i915/intel_lrc.c|   5 +
 include/uapi/drm/i915_drm.h |  19 +-
 6 files changed, 948 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 13b9125cacdd..b8dcf281db53 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2061,9 +2061,17 @@ struct i915_oa_ops {
void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
 
/**
-* @enable_metric_set: Applies any MUX configuration to set up the
-* Boolean and Custom (B/C) counters that are part of the counter
-* reports being sampled. May apply system constraints such as
+* @select_metric_set: The auto generated code that checks whether a
+* requested OA config is applicable to the system and if so sets up
+* the mux, oa and flex eu register config pointers according to the
+* current dev_priv->perf.oa.metrics_set.
+*/
+   int (*select_metric_set)(struct drm_i915_private *dev_priv);
+
+   /**
+* @enable_metric_set: Selects and applies any MUX configuration to set
+* up the Boolean and Custom (B/C) counters that are part of the
+* counter reports being sampled. May apply system constraints such as
 * disabling EU clock gating as required.
 */
int (*enable_metric_set)(struct drm_i915_private *dev_priv);
@@ -2094,20 +2102,13 @@ struct i915_oa_ops {
size_t *offset);
 
/**
-* @oa_buffer_check: Check for OA buffer data + update tail
-*
-* This is either called via fops or the poll check hrtimer (atomic
-* ctx) without any locks taken.
+* @oa_hw_tail_read: read the OA tail pointer register
 *
-* It's safe to read OA config state here unlocked, assuming that this
-* is only called while the stream is enabled, while the global OA
-* configuration can't be modified.
-*
-* Efficiency is more important than avoiding some 

[Intel-gfx] [PATCH v4 11/15] drm/i915/perf: Add 'render basic' Gen8+ OA unit configs

2017-04-12 Thread Robert Bragg
Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic
render metrics on Broadwell, Cherryview, Skylake and Broxton. These are
auto generated from an XML description of metric sets, currently
maintained in gputop, ref:

 https://github.com/rib/gputop
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml WHITELIST=RenderBasic

v2: add newlines to debug messages + fix comment (Matthew Auld)

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
---
 drivers/gpu/drm/i915/Makefile |   8 +-
 drivers/gpu/drm/i915/i915_drv.h   |   2 +
 drivers/gpu/drm/i915/i915_oa_bdw.c| 380 ++
 drivers/gpu/drm/i915/i915_oa_bdw.h|  38 
 drivers/gpu/drm/i915/i915_oa_bxt.c| 238 +
 drivers/gpu/drm/i915/i915_oa_bxt.h|  38 
 drivers/gpu/drm/i915/i915_oa_chv.c| 225 
 drivers/gpu/drm/i915/i915_oa_chv.h|  38 
 drivers/gpu/drm/i915/i915_oa_sklgt2.c | 228 
 drivers/gpu/drm/i915/i915_oa_sklgt2.h |  38 
 drivers/gpu/drm/i915/i915_oa_sklgt3.c | 236 +
 drivers/gpu/drm/i915/i915_oa_sklgt3.h |  38 
 drivers/gpu/drm/i915/i915_oa_sklgt4.c | 247 ++
 drivers/gpu/drm/i915/i915_oa_sklgt4.h |  38 
 14 files changed, 1791 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 2cf04504e494..41400a138a1e 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -127,7 +127,13 @@ i915-y += i915_vgpu.o
 
 # perf code
 i915-y += i915_perf.o \
- i915_oa_hsw.o
+ i915_oa_hsw.o \
+ i915_oa_bdw.o \
+ i915_oa_chv.o \
+ i915_oa_sklgt2.o \
+ i915_oa_sklgt3.o \
+ i915_oa_sklgt4.o \
+ i915_oa_bxt.o
 
 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a0e34934a11f..13b9125cacdd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2463,6 +2463,8 @@ struct drm_i915_private {
int mux_regs_len;
const struct i915_oa_reg *b_counter_regs;
int b_counter_regs_len;
+   const struct i915_oa_reg *flex_regs;
+   int flex_regs_len;
 
struct {
struct i915_vma *vma;
diff --git a/drivers/gpu/drm/i915/i915_oa_bdw.c 
b/drivers/gpu/drm/i915/i915_oa_bdw.c
new file mode 100644
index ..b0b1b75fb431
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_bdw.c
@@ -0,0 +1,380 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include 
+
+#include "i915_drv.h"
+#include "i915_oa_bdw.h"
+
+enum metric_set_id {
+   METRIC_SET_ID_RENDER_BASIC = 1,
+};
+
+int i915_oa_n_builtin_metric_sets_bdw = 1;

[Intel-gfx] [PATCH v4 10/15] drm/i915: expose _SUBSLICE_MASK GETPARM

2017-04-12 Thread Robert Bragg
Assuming a uniform mask across all slices, this enables userspace to
determine the specific sub slices enabled. This information is required,
for example, to be able to analyse some OA counter reports where the
counter configuration depends on the HW sub slice configuration.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 5 +
 include/uapi/drm/i915_drm.h | 5 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 76a724b7cc22..133ab46bf2f2 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -362,6 +362,11 @@ static int i915_getparam(struct drm_device *dev, void 
*data,
if (!value)
return -ENODEV;
break;
+   case I915_PARAM_SUBSLICE_MASK:
+   value = INTEL_INFO(dev_priv)->sseu.subslice_mask;
+   if (!value)
+   return -ENODEV;
+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 99bfc3648454..689fd7a418a7 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -415,6 +415,11 @@ typedef struct drm_i915_irq_wait {
 /* Query the mask of slices available for this system */
 #define I915_PARAM_SLICE_MASK   45
 
+/* Assuming it's uniform for each slice, this queries the mask of subslices
+ * per-slice for this system.
+ */
+#define I915_PARAM_SUBSLICE_MASK46
+
 typedef struct drm_i915_getparam {
__s32 param;
/*
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 09/15] drm/i915: expose _SLICE_MASK GETPARM

2017-04-12 Thread Robert Bragg
Enables userspace to determine the number of slices enabled and also
know what specific slices are enabled. This information is required, for
example, to be able to analyse some OA counter reports where the counter
configuration depends on the HW slice configuration.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 5 +
 include/uapi/drm/i915_drm.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index bd85e3826b72..76a724b7cc22 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -357,6 +357,11 @@ static int i915_getparam(struct drm_device *dev, void 
*data,
 */
value = 1;
break;
+   case I915_PARAM_SLICE_MASK:
+   value = INTEL_INFO(dev_priv)->sseu.slice_mask;
+   if (!value)
+   return -ENODEV;
+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 9ee06ec8a2d6..99bfc3648454 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -412,6 +412,9 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_HAS_EXEC_FENCE   44
 
+/* Query the mask of slices available for this system */
+#define I915_PARAM_SLICE_MASK   45
+
 typedef struct drm_i915_getparam {
__s32 param;
/*
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 08/15] drm/i915/perf: rate limit spurious oa report notice

2017-04-12 Thread Robert Bragg
This change is pre-emptively aiming to avoid a potential cause of kernel
logging noise in case some condition were to result in us seeing invalid
OA reports.

The workaround for the OA unit's tail pointer race condition is what
avoids the primary known cause of invalid reports being seen and with
that in place we aren't expecting to see this notice but it can't be
entirely ruled out.

Just in case some condition does lead to the notice then it's likely
that it will be triggered repeatedly while attempting to append a
sequence of reports and depending on the configured OA sampling
frequency that might be a large number of repeat notices.

v2: (Chris) avoid inconsistent warning on throttle with
printk_ratelimit()
v3: (Matt) init and summarise with stream init/close not driver init/fini

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  6 ++
 drivers/gpu/drm/i915/i915_perf.c | 28 +++-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 088c4c60bd38..a0e34934a11f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2448,6 +2448,12 @@ struct drm_i915_private {
wait_queue_head_t poll_wq;
bool pollin;
 
+   /**
+* For rate limiting any notifications of spurious
+* invalid OA reports
+*/
+   struct ratelimit_state spurious_report_rs;
+
bool periodic;
int period_exponent;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 5738b99caa5b..3277a52ce98e 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -632,7 +632,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
 * copying it to userspace...
 */
if (report32[0] == 0) {
-   DRM_NOTE("Skipping spurious, invalid OA report\n");
+   if (__ratelimit(_priv->perf.oa.spurious_report_rs))
+   DRM_NOTE("Skipping spurious, invalid OA 
report\n");
continue;
}
 
@@ -913,6 +914,11 @@ static void i915_oa_stream_destroy(struct i915_perf_stream 
*stream)
oa_put_render_ctx_id(stream);
 
dev_priv->perf.oa.exclusive_stream = NULL;
+
+   if (dev_priv->perf.oa.spurious_report_rs.missed) {
+   DRM_NOTE("%d spurious OA report notices suppressed due to 
ratelimiting\n",
+dev_priv->perf.oa.spurious_report_rs.missed);
+   }
 }
 
 static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv)
@@ -1268,6 +1274,26 @@ static int i915_oa_stream_init(struct i915_perf_stream 
*stream,
return -EINVAL;
}
 
+   /* We set up some ratelimit state to potentially throttle any _NOTES
+* about spurious, invalid OA reports which we don't forward to
+* userspace.
+*
+* The initialization is associated with opening the stream (not driver
+* init) considering we print a _NOTE about any throttling when closing
+* the stream instead of waiting until driver _fini which no one would
+* ever see.
+*
+* Using the same limiting factors as printk_ratelimit()
+*/
+   ratelimit_state_init(_priv->perf.oa.spurious_report_rs,
+5 * HZ, 10);
+   /* Since we use a DRM_NOTE for spurious reports it would be
+* inconsistent to let __ratelimit() automatically print a warning for
+* throttling.
+*/
+   ratelimit_set_flags(_priv->perf.oa.spurious_report_rs,
+   RATELIMIT_MSG_ON_RELEASE);
+
stream->sample_size = sizeof(struct drm_i915_perf_record_header);
 
format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size;
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 04/15] drm/i915/perf: no head/tail ref in gen7_oa_read

2017-04-12 Thread Robert Bragg
This avoids redundantly passing an (inout) head and tail pointer to
gen7_append_oa_reports() from gen7_oa_read which doesn't need to
reference either itself.

Moving the head/tail reads and writes into gen7_append_oa_reports should
have no functional effect except to avoid some redundant head pointer
writes in cases where nothing was copied to userspace.

This is a stepping stone towards updating how the head and tail pointer
state is managed to improve the workaround for the OA unit's tail
pointer race. It reduces the number of places we need to read/write the
head and tail pointers.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 51 +++-
 1 file changed, 19 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index f47d1cc2144b..83dc67a635fb 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -420,8 +420,6 @@ static int append_oa_sample(struct i915_perf_stream *stream,
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
- * @head_ptr: (inout): the current oa buffer cpu read position
- * @tail: the current oa buffer gpu write position
  *
  * Notably any error condition resulting in a short read (-%ENOSPC or
  * -%EFAULT) will be returned even though one or more records may
@@ -439,9 +437,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 static int gen7_append_oa_reports(struct i915_perf_stream *stream,
  char __user *buf,
  size_t count,
- size_t *offset,
- u32 *head_ptr,
- u32 tail)
+ size_t *offset)
 {
struct drm_i915_private *dev_priv = stream->dev_priv;
int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -449,14 +445,15 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
int tail_margin = dev_priv->perf.oa.tail_margin;
u32 gtt_offset = i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma);
u32 mask = (OA_BUFFER_SIZE - 1);
-   u32 head;
+   size_t start_offset = *offset;
+   u32 head, oastatus1, tail;
u32 taken;
int ret = 0;
 
if (WARN_ON(!stream->enabled))
return -EIO;
 
-   head = *head_ptr - gtt_offset;
+   head = dev_priv->perf.oa.oa_buffer.head - gtt_offset;
 
/* An out of bounds or misaligned head pointer implies a driver bug
 * since we are in full control of head pointer which should only
@@ -467,7 +464,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
  "Inconsistent OA buffer head pointer = %u\n", head))
return -EIO;
 
-   tail -= gtt_offset;
+   oastatus1 = I915_READ(GEN7_OASTATUS1);
+   tail = (oastatus1 & GEN7_OASTATUS1_TAIL_MASK) - gtt_offset;
 
/* The OA unit is expected to wrap the tail pointer according to the OA
 * buffer size
@@ -477,8 +475,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
  tail);
dev_priv->perf.oa.ops.oa_disable(dev_priv);
dev_priv->perf.oa.ops.oa_enable(dev_priv);
-   *head_ptr = I915_READ(GEN7_OASTATUS2) &
-   GEN7_OASTATUS2_HEAD_MASK;
return -EIO;
}
 
@@ -542,7 +538,18 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
report32[0] = 0;
}
 
-   *head_ptr = gtt_offset + head;
+
+   if (start_offset != *offset) {
+   /* We removed the gtt_offset for the copy loop above, indexing
+* relative to oa_buf_base so put back here...
+*/
+   head += gtt_offset;
+
+   I915_WRITE(GEN7_OASTATUS2,
+  ((head & GEN7_OASTATUS2_HEAD_MASK) |
+   OA_MEM_SELECT_GGTT));
+   dev_priv->perf.oa.oa_buffer.head = head;
+   }
 
return ret;
 }
@@ -570,8 +577,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 {
struct drm_i915_private *dev_priv = stream->dev_priv;
u32 oastatus1;
-   u32 head;
-   u32 tail;
int ret;
 
if (WARN_ON(!dev_priv->perf.oa.oa_buffer.vaddr))
@@ -579,9 +584,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 
oastatus1 = I915_READ(GEN7_OASTATUS1);
 
-   head = dev_priv->perf.oa.oa_buffer.head;
-   tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
-
/* XXX: On Haswell we don't have a safe way to clear oastatus1
 

[Intel-gfx] [PATCH v4 07/15] drm/i915/perf: better pipeline aged/aging tail updates

2017-04-12 Thread Robert Bragg
This updates the tail pointer race workaround handling to updating the
'aged' pointer before looking to start aging a new one. There's the
possibility that there is already new data available and so we can
immediately start aging a new pointer without having to first wait for a
later hrtimer callback (and then another to age).

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 41 ++--
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 08cc2b0dd734..5738b99caa5b 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -391,6 +391,29 @@ static bool gen7_oa_buffer_check_unlocked(struct 
drm_i915_private *dev_priv)
 
now = ktime_get_mono_fast_ns();
 
+   /* Update the aged tail
+*
+* Flip the tail pointer available for read()s once the aging tail is
+* old enough to trust that the corresponding data will be visible to
+* the CPU...
+*
+* Do this before updating the aging pointer in case we may be able to
+* immediately start aging a new pointer too (if new data has become
+* available) without needing to wait for a later hrtimer callback.
+*/
+   if (aging_tail != INVALID_TAIL_PTR &&
+   ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) >
+OA_TAIL_MARGIN_NSEC)) {
+   aged_idx ^= 1;
+   dev_priv->perf.oa.oa_buffer.aged_tail_idx = aged_idx;
+
+   aged_tail = aging_tail;
+
+   /* Mark that we need a new pointer to start aging... */
+   dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = 
INVALID_TAIL_PTR;
+   aging_tail = INVALID_TAIL_PTR;
+   }
+
/* Update the aging tail
 *
 * We throttle aging tail updates until we have a new tail that
@@ -420,24 +443,6 @@ static bool gen7_oa_buffer_check_unlocked(struct 
drm_i915_private *dev_priv)
}
}
 
-   /* Update the aged tail
-*
-* Flip the tail pointer available for read()s once the aging tail is
-* old enough to trust that the corresponding data will be visible to
-* the CPU...
-*/
-   if (aging_tail != INVALID_TAIL_PTR &&
-   ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) >
-OA_TAIL_MARGIN_NSEC)) {
-   aged_idx ^= 1;
-   dev_priv->perf.oa.oa_buffer.aged_tail_idx = aged_idx;
-
-   aged_tail = aging_tail;
-
-   /* Mark that we need a new pointer to start aging... */
-   dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = 
INVALID_TAIL_PTR;
-   }
-
spin_unlock_irqrestore(_priv->perf.oa.oa_buffer.ptr_lock, flags);
 
return aged_tail == INVALID_TAIL_PTR ?
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 01/15] drm/i915/perf: fix gen7_append_oa_reports comment

2017-04-12 Thread Robert Bragg
If I'm going to complain about a back-to-front convention then the least
I can do is not muddle the comment up too.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 060b171480d5..78fef53b45c9 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -431,7 +431,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
  * userspace.
  *
  * Note: reports are consumed from the head, and appended to the
- * tail, so the head chases the tail?... If you think that's mad
+ * tail, so the tail chases the head?... If you think that's mad
  * and back-to-front you're not alone, but this follows the
  * Gen PRM naming convention.
  *
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 05/15] drm/i915/perf: improve tail race workaround

2017-04-12 Thread Robert Bragg
There's a HW race condition between OA unit tail pointer register
updates and writes to memory whereby the tail pointer can sometimes get
ahead of what's been written out to the OA buffer so far (in terms of
what's visible to the CPU).

Although this can be observed explicitly while copying reports to
userspace by checking for a zeroed report-id field in tail reports, we
want to account for this earlier, as part of the _oa_buffer_check to
avoid lots of redundant read() attempts.

Previously the driver used to define an effective tail pointer that
lagged the real pointer by a 'tail margin' measured in bytes derived
from OA_TAIL_MARGIN_NSEC and the configured sampling frequency.
Unfortunately this was flawed considering that the OA unit may also
automatically generate non-periodic reports (such as on context switch)
or the OA unit may be enabled without any periodic sampling.

This improves how we define a tail pointer for reading that lags the
real tail pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which
gives enough time for the corresponding reports to become visible to the
CPU.

The driver now maintains two tail pointers:
 1) An 'aging' tail with an associated timestamp that is tracked until we
can trust the corresponding data is visible to the CPU; at which point
it is considered 'aged'.
 2) An 'aged' tail that can be used for read()ing.

The two separate pointers let us decouple read()s from tail pointer aging.

The tail pointers are checked and updated at a limited rate within a
hrtimer callback (the same callback that is used for delivering POLLIN
events) and since we're now measuring the wall clock time elapsed since
a given tail pointer was read the mechanism no longer cares about
the OA unit's periodic sampling frequency.

The natural place to handle the tail pointer updates was in
gen7_oa_buffer_is_empty() which is called as part of blocking reads and
the hrtimer callback used for polling, and so this was renamed to
oa_buffer_check() considering the added side effect while checking
whether the buffer contains data.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  60 -
 drivers/gpu/drm/i915/i915_perf.c | 277 ++-
 2 files changed, 241 insertions(+), 96 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2f8a7a4f29df..088c4c60bd38 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2094,7 +2094,7 @@ struct i915_oa_ops {
size_t *offset);
 
/**
-* @oa_buffer_is_empty: Check if OA buffer empty (false positives OK)
+* @oa_buffer_check: Check for OA buffer data + update tail
 *
 * This is either called via fops or the poll check hrtimer (atomic
 * ctx) without any locks taken.
@@ -2107,7 +2107,7 @@ struct i915_oa_ops {
 * here, which will be handled gracefully - likely resulting in an
 * %EAGAIN error for userspace.
 */
-   bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
+   bool (*oa_buffer_check)(struct drm_i915_private *dev_priv);
 };
 
 struct intel_cdclk_state {
@@ -2450,9 +2450,6 @@ struct drm_i915_private {
 
bool periodic;
int period_exponent;
-   int timestamp_frequency;
-
-   int tail_margin;
 
int metrics_set;
 
@@ -2468,6 +2465,59 @@ struct drm_i915_private {
int format_size;
 
/**
+* Locks reads and writes to all head/tail state
+*
+* Consider: the head and tail pointer state
+* needs to be read consistently from a hrtimer
+* callback (atomic context) and read() fop
+* (user context) with tail pointer updates
+* happening in atomic context and head updates
+* in user context and the (unlikely)
+* possibility of read() errors needing to
+* reset all head/tail state.
+*
+* Note: Contention or performance aren't
+* currently a significant concern here
+* considering the relatively low frequency of
+* hrtimer callbacks (5ms period) and that
+* reads typically only happen in response to a
+* hrtimer event and likely complete before the
+* n

[Intel-gfx] [PATCH v4 06/15] drm/i915/perf: improve invalid OA format debug message

2017-04-12 Thread Robert Bragg
A minor improvement to debugging output

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 18734e1926b9..08cc2b0dd734 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1904,11 +1904,13 @@ static int read_properties_unlocked(struct 
drm_i915_private *dev_priv,
break;
case DRM_I915_PERF_PROP_OA_FORMAT:
if (value == 0 || value >= I915_OA_FORMAT_MAX) {
-   DRM_DEBUG("Invalid OA report format\n");
+   DRM_DEBUG("Out-of-range OA report format 
%llu\n",
+ value);
return -EINVAL;
}
if (!dev_priv->perf.oa.oa_formats[value].size) {
-   DRM_DEBUG("Invalid OA report format\n");
+   DRM_DEBUG("Unsupported OA report format %llu\n",
+ value);
return -EINVAL;
}
props->oa_format = value;
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 02/15] drm/i915/perf: avoid poll, read, EAGAIN busy loops

2017-04-12 Thread Robert Bragg
If the function for checking whether there is OA buffer data available
(during a poll or blocking read) has false positives then we want to
avoid a situation where the subsequent read() returns EAGAIN (after
a more accurate check) followed by a poll() immediately reporting
the same false positive POLLIN event and effectively maintaining a
busy loop until there really is data.

This makes sure that we clear the .pollin event status whenever we
return EAGAIN to userspace which will throttle subsequent POLLIN events
and repeated attempts to read to the 5ms intervals of the hrtimer
callback we have.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 78fef53b45c9..f59f6dd20922 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1352,7 +1352,15 @@ static ssize_t i915_perf_read(struct file *file,
mutex_unlock(_priv->perf.lock);
}
 
-   if (ret >= 0) {
+   /* We allow the poll checking to sometimes report false positive POLLIN
+* events where we might actually report EAGAIN on read() if there's
+* not really any data available. In this situation though we don't
+* want to enter a busy loop between poll() reporting a POLLIN event
+* and read() returning -EAGAIN. Clearing the oa.pollin state here
+* effectively ensures we back off until the next hrtimer callback
+* before reporting another POLLIN event.
+*/
+   if (ret >= 0 || ret == -EAGAIN) {
/* Maybe make ->pollin per-stream state if we support multiple
 * concurrent streams in the future.
 */
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 03/15] drm/i915/perf: avoid read back of head register

2017-04-12 Thread Robert Bragg
There's no need for the driver to keep reading back the head pointer
from hardware since the hardware doesn't update it automatically. This
way we can treat any invalid head pointer value as a software/driver
bug instead of spurious hardware behaviour.

This change is also a small stepping stone towards re-working how
the head and tail state is managed as part of an improved workaround
for the tail register race condition.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  | 11 ++
 drivers/gpu/drm/i915/i915_perf.c | 46 ++--
 2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1af4e6f5410c..2f8a7a4f29df 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2466,6 +2466,17 @@ struct drm_i915_private {
u8 *vaddr;
int format;
int format_size;
+
+   /**
+* Although we can always read back the head
+* pointer register, we prefer to avoid
+* trusting the HW state, just to avoid any
+* risk that some hardware condition could
+* somehow bump the head pointer unpredictably
+* and cause us to forward the wrong OA buffer
+* data to userspace.
+*/
+   u32 head;
} oa_buffer;
 
u32 gen7_latched_oastatus1;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index f59f6dd20922..f47d1cc2144b 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -322,9 +322,8 @@ struct perf_open_properties {
 static bool gen7_oa_buffer_is_empty_fop_unlocked(struct drm_i915_private 
*dev_priv)
 {
int report_size = dev_priv->perf.oa.oa_buffer.format_size;
-   u32 oastatus2 = I915_READ(GEN7_OASTATUS2);
u32 oastatus1 = I915_READ(GEN7_OASTATUS1);
-   u32 head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK;
+   u32 head = dev_priv->perf.oa.oa_buffer.head;
u32 tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
 
return OA_TAKEN(tail, head) <
@@ -458,16 +457,24 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
return -EIO;
 
head = *head_ptr - gtt_offset;
+
+   /* An out of bounds or misaligned head pointer implies a driver bug
+* since we are in full control of head pointer which should only
+* be incremented by multiples of the report size (notably also
+* all a power of two).
+*/
+   if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size,
+ "Inconsistent OA buffer head pointer = %u\n", head))
+   return -EIO;
+
tail -= gtt_offset;
 
/* The OA unit is expected to wrap the tail pointer according to the OA
-* buffer size and since we should never write a misaligned head
-* pointer we don't expect to read one back either...
+* buffer size
 */
-   if (tail > OA_BUFFER_SIZE || head > OA_BUFFER_SIZE ||
-   head % report_size) {
-   DRM_ERROR("Inconsistent OA buffer pointer (head = %u, tail = 
%u): force restart\n",
- head, tail);
+   if (tail > OA_BUFFER_SIZE) {
+   DRM_ERROR("Inconsistent OA buffer tail pointer = %u: force 
restart\n",
+ tail);
dev_priv->perf.oa.ops.oa_disable(dev_priv);
dev_priv->perf.oa.ops.oa_enable(dev_priv);
*head_ptr = I915_READ(GEN7_OASTATUS2) &
@@ -562,8 +569,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
size_t *offset)
 {
struct drm_i915_private *dev_priv = stream->dev_priv;
-   int report_size = dev_priv->perf.oa.oa_buffer.format_size;
-   u32 oastatus2;
u32 oastatus1;
u32 head;
u32 tail;
@@ -572,10 +577,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
if (WARN_ON(!dev_priv->perf.oa.oa_buffer.vaddr))
return -EIO;
 
-   oastatus2 = I915_READ(GEN7_OASTATUS2);
oastatus1 = I915_READ(GEN7_OASTATUS1);
 
-   head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK;
+   head = dev_priv->perf.oa.oa_buffer.head;
tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
 
/* XXX: On Haswell we don't have a safe way to clear oastatus1
@@ -616,10 +620,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
   

[Intel-gfx] [PATCH v4 00/15] Enable OA unit for Gen 8 and 9 in i915 perf

2017-04-12 Thread Robert Bragg
Updates based on latest review feedback from Matthew and Lionel and includes an
update to the TestOa register config for SKL GT2 compared the last series (based
on the latest XML files from VPG)

Although the _[SUB]SLICE_MASK GETPARM patches were reviewed, it's worth
mentioning there was a TODO comment in the last patches about rebasing the
parameter ID before upstreaming which is removed now, and there's a minimal
comment for what the new parameters are, consistent with other more recently
added parameters. Conveniently the actual IDs didn't need rebasing so there's
no last moment uapi change for gputop, mesa and igt.

The series is longer just because I've included the gen7 prep patches (already
reviewed) that I haven't landed yet but the gen8+ bits depend on.

Regards,
- Robert

Robert Bragg (15):
  drm/i915/perf: fix gen7_append_oa_reports comment
  drm/i915/perf: avoid poll, read, EAGAIN busy loops
  drm/i915/perf: avoid read back of head register
  drm/i915/perf: no head/tail ref in gen7_oa_read
  drm/i915/perf: improve tail race workaround
  drm/i915/perf: improve invalid OA format debug message
  drm/i915/perf: better pipeline aged/aging tail updates
  drm/i915/perf: rate limit spurious oa report notice
  drm/i915: expose _SLICE_MASK GETPARM
  drm/i915: expose _SUBSLICE_MASK GETPARM
  drm/i915/perf: Add 'render basic' Gen8+ OA unit configs
  drm/i915/perf: Add OA unit support for Gen 8+
  drm/i915/perf: Add more OA configs for BDW, CHV, SKL + BXT
  drm/i915/perf: per-gen timebase for checking sample freq
  drm/i915/perf: remove perf.hook_lock

 drivers/gpu/drm/i915/Makefile   |8 +-
 drivers/gpu/drm/i915/i915_drv.c |   10 +
 drivers/gpu/drm/i915/i915_drv.h |  121 +-
 drivers/gpu/drm/i915/i915_gem_context.h |1 +
 drivers/gpu/drm/i915/i915_oa_bdw.c  | 5154 +++
 drivers/gpu/drm/i915/i915_oa_bdw.h  |   38 +
 drivers/gpu/drm/i915/i915_oa_bxt.c  | 2541 +++
 drivers/gpu/drm/i915/i915_oa_bxt.h  |   38 +
 drivers/gpu/drm/i915/i915_oa_chv.c  | 2730 
 drivers/gpu/drm/i915/i915_oa_chv.h  |   38 +
 drivers/gpu/drm/i915/i915_oa_hsw.c  |   58 +-
 drivers/gpu/drm/i915/i915_oa_sklgt2.c   | 3302 
 drivers/gpu/drm/i915/i915_oa_sklgt2.h   |   38 +
 drivers/gpu/drm/i915/i915_oa_sklgt3.c   | 2856 +
 drivers/gpu/drm/i915/i915_oa_sklgt3.h   |   38 +
 drivers/gpu/drm/i915/i915_oa_sklgt4.c   | 2910 +
 drivers/gpu/drm/i915/i915_oa_sklgt4.h   |   38 +
 drivers/gpu/drm/i915/i915_perf.c| 1341 ++--
 drivers/gpu/drm/i915/i915_reg.h |   22 +
 drivers/gpu/drm/i915/intel_lrc.c|5 +
 include/uapi/drm/i915_drm.h |   27 +-
 21 files changed, 21076 insertions(+), 238 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.h

-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2] drm/i915/perf: per-gen timebase for checking sample freq

2017-04-12 Thread Robert Bragg
On Wed, Apr 12, 2017 at 1:34 PM, Matthew Auld <
matthew.william.a...@gmail.com> wrote:

> On 5 April 2017 at 20:05, Robert Bragg <rob...@sixbynine.org> wrote:
> > An oa_exponent_to_ns() utility and per-gen timebase constants where
> were
>
> > recently removed when updating the tail pointer race condition WA, and
> > this restores those so we can update the _PROP_OA_EXPONENT validation
> > done in read_properties_unlocked() to not assume we have a 12.5MHz
> > timebase as we did for Haswell.
> >
> > Accordingly the oa_sample_rate_hard_limit value that's referenced by
> > proc_dointvec_minmax defining the absolute limit for the OA sampling
> > frequency is now initialized to (timestamp_frequency / 2) instead of the
> > 6.25MHz constant for Haswell.
> >
> > v2:
> > Specify frequency of 19.2MHz for BXT (Ville)
> > Initialize oa_sample_rate_hard_limit per-gen too (Lionel)
> >
> > Signed-off-by: Robert Bragg <rob...@sixbynine.org>
> > Cc: Lionel Landwerlin <lionel.g.landwer...@linux.intel.com>
> > Cc: Ville Syrjälä <ville.syrj...@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h  |  1 +
> >  drivers/gpu/drm/i915/i915_perf.c | 31 ++-
> >  2 files changed, 23 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> > index 3a22b6fd0ee6..48b07d706f06 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2463,6 +2463,7 @@ struct drm_i915_private {
> >
> > bool periodic;
> > int period_exponent;
> > +   int timestamp_frequency;
> >
> > int metrics_set;
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c
> b/drivers/gpu/drm/i915/i915_perf.c
> > index 98eb6415b63a..980b4a1fd7cc 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -288,10 +288,12 @@ static u32 i915_perf_stream_paranoid = true;
> >
> >  /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
> >   *
> > - * 160ns is the smallest sampling period we can theoretically program
> the OA
> > - * unit with on Haswell, corresponding to 6.25MHz.
> > + * The highest sampling frequency we can theoretically program the OA
> unit
> > + * with is always half the timestamp frequency: E.g. 6.25Mhz for
> Haswell.
> > + *
> > + * Initialized just before we register the sysctl parameter.
> >   */
> > -static int oa_sample_rate_hard_limit = 625;
> > +static int oa_sample_rate_hard_limit;
> >
> >  /* Theoretically we can program the OA unit to sample every 160ns but
> don't
> >   * allow that by default unless root...
> > @@ -2549,6 +2551,12 @@ i915_perf_open_ioctl_locked(struct
> drm_i915_private *dev_priv,
> > return ret;
> >  }
> >
> > +static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int
> exponent)
> > +{
> > +   return div_u64(10ULL * (2ULL << exponent),
> > +  dev_priv->perf.oa.timestamp_frequency);
> > +}
> > +
> >  /**
> >   * read_properties_unlocked - validate + copy userspace stream open
> properties
> >   * @dev_priv: i915 device instance
> > @@ -2647,14 +2655,9 @@ static int read_properties_unlocked(struct
> drm_i915_private *dev_priv,
> > /* Theoretically we can program the OA unit to
> sample
> >  * every 160ns but don't allow that by default
> unless
> hmm, that's not actually true if we consider BXT, right?
>

right, I've updated this comment now.



>
> Reviewed-by: Matthew Auld <matthew.a...@intel.com>
>

thanks
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3 4/7] drm/i915/perf: Add OA unit support for Gen 8+

2017-04-12 Thread Robert Bragg
On Wed, Apr 12, 2017 at 12:33 PM, Matthew Auld <matthew.william.auld@gmail.
com> wrote:

> On 04/05, Robert Bragg wrote:
> > Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
> > share (more-or-less) the same OA unit design.
> >
> > Of particular note in comparison to Haswell: some OA unit HW config
> > state has become per-context state and as a consequence it is somewhat
> > more complicated to manage synchronous state changes from the cpu while
> > there's no guarantee of what context (if any) is currently actively
> > running on the gpu.
> >
> > The periodic sampling frequency which can be particularly useful for
> > system-wide analysis (as opposed to command stream synchronised
> > MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
> > have become per-context save and restored (while the OABUFFER
> > destination is still a shared, system-wide resource).
> >
> > This support for gen8+ takes care to consider a number of timing
> > challenges involved in synchronously updating per-context state
> > primarily by programming all config state from the cpu and updating all
> > current and saved contexts synchronously while the OA unit is still
> > disabled.
> >
> > The driver intentionally avoids depending on command streamer
> > programming to update OA state considering the lack of synchronization
> > between the automatic loading of OACTXCONTROL state (that includes the
> > periodic sampling state and enable state) on context restore and the
> > parsing of any general purpose BB the driver can control. I.e. this
> > implementation is careful to avoid the possibility of a context restore
> > temporarily enabling any out-of-date periodic sampling state. In
> > addition to the risk of transiently-out-of-date state being loaded
> > automatically; there are also internal HW latencies involved in the
> > loading of MUX configurations which would be difficult to account for
> > from the command streamer (and we only want to enable the unit when once
> > the MUX configuration is complete).
> >
> > Since the Gen8+ OA unit design no longer supports clock gating the unit
> > off for a single given context (which effectively stopped any progress
> > of counters while any other context was running) and instead supports
> > tagging OA reports with a context ID for filtering on the CPU, it means
> > we can no longer hide the system-wide progress of counters from a
> > non-privileged application only interested in metrics for its own
> > context. Although we could theoretically try and subtract the progress
> > of other contexts before forwarding reports via read() we aren't in a
> > position to filter reports captured via MI_REPORT_PERF_COUNT commands.
> > As a result, for Gen8+, we always require the
> > dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
> > if not root.
> >
> > Signed-off-by: Robert Bragg <rob...@sixbynine.org>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h |  45 +-
> >  drivers/gpu/drm/i915/i915_gem_context.h |   1 +
> >  drivers/gpu/drm/i915/i915_perf.c| 938
> +---
> >  drivers/gpu/drm/i915/i915_reg.h |  22 +
> >  drivers/gpu/drm/i915/intel_lrc.c|   5 +
> >  include/uapi/drm/i915_drm.h |  19 +-
> >  6 files changed, 937 insertions(+), 93 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> > index 9c37b73ac7ac..3a22b6fd0ee6 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2067,9 +2067,17 @@ struct i915_oa_ops {
> >   void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
> >
> >   /**
> > -  * @enable_metric_set: Applies any MUX configuration to set up the
> > -  * Boolean and Custom (B/C) counters that are part of the counter
> > -  * reports being sampled. May apply system constraints such as
> > +  * @select_metric_set: The auto generated code that checks whether
> a
> > +  * requested OA config is applicable to the system and if so sets
> up
> > +  * the mux, oa and flex eu register config pointers according to
> the
> > +  * current dev_priv->perf.oa.metrics_set.
> > +  */
> > + int (*select_metric_set)(struct drm_i915_private *dev_priv);
> > +
> > + /**
> > +  * @enable_metric_set: Selects and applies any MUX configuration
> to set
> > +  * up the Boolean and Custom (B/C) counters that are part of the
> >

[Intel-gfx] [PATCH v2] drm/i915/perf: per-gen timebase for checking sample freq

2017-04-05 Thread Robert Bragg
An oa_exponent_to_ns() utility and per-gen timebase constants where
recently removed when updating the tail pointer race condition WA, and
this restores those so we can update the _PROP_OA_EXPONENT validation
done in read_properties_unlocked() to not assume we have a 12.5MHz
timebase as we did for Haswell.

Accordingly the oa_sample_rate_hard_limit value that's referenced by
proc_dointvec_minmax defining the absolute limit for the OA sampling
frequency is now initialized to (timestamp_frequency / 2) instead of the
6.25MHz constant for Haswell.

v2:
Specify frequency of 19.2MHz for BXT (Ville)
Initialize oa_sample_rate_hard_limit per-gen too (Lionel)

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Cc: Lionel Landwerlin <lionel.g.landwer...@linux.intel.com>
Cc: Ville Syrjälä <ville.syrj...@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/i915_perf.c | 31 ++-
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3a22b6fd0ee6..48b07d706f06 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2463,6 +2463,7 @@ struct drm_i915_private {
 
bool periodic;
int period_exponent;
+   int timestamp_frequency;
 
int metrics_set;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 98eb6415b63a..980b4a1fd7cc 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -288,10 +288,12 @@ static u32 i915_perf_stream_paranoid = true;
 
 /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
  *
- * 160ns is the smallest sampling period we can theoretically program the OA
- * unit with on Haswell, corresponding to 6.25MHz.
+ * The highest sampling frequency we can theoretically program the OA unit
+ * with is always half the timestamp frequency: E.g. 6.25Mhz for Haswell.
+ *
+ * Initialized just before we register the sysctl parameter.
  */
-static int oa_sample_rate_hard_limit = 625;
+static int oa_sample_rate_hard_limit;
 
 /* Theoretically we can program the OA unit to sample every 160ns but don't
  * allow that by default unless root...
@@ -2549,6 +2551,12 @@ i915_perf_open_ioctl_locked(struct drm_i915_private 
*dev_priv,
return ret;
 }
 
+static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int exponent)
+{
+   return div_u64(10ULL * (2ULL << exponent),
+  dev_priv->perf.oa.timestamp_frequency);
+}
+
 /**
  * read_properties_unlocked - validate + copy userspace stream open properties
  * @dev_priv: i915 device instance
@@ -2647,14 +2655,9 @@ static int read_properties_unlocked(struct 
drm_i915_private *dev_priv,
/* Theoretically we can program the OA unit to sample
 * every 160ns but don't allow that by default unless
 * root.
-*
-* On Haswell the period is derived from the exponent
-* as:
-*
-*   period = 80ns * 2^(exponent + 1)
 */
BUILD_BUG_ON(sizeof(oa_period) != 8);
-   oa_period = 80ull * (2ull << value);
+   oa_period = oa_exponent_to_ns(dev_priv, value);
 
/* This check is primarily to ensure that oa_period <=
 * UINT32_MAX (before passing to do_div which only
@@ -2910,6 +2913,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
dev_priv->perf.oa.ops.oa_hw_tail_read =
gen7_oa_hw_tail_read;
 
+   dev_priv->perf.oa.timestamp_frequency = 1250;
+
dev_priv->perf.oa.oa_formats = hsw_oa_formats;
 
dev_priv->perf.oa.n_builtin_sets =
@@ -2923,6 +2928,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 */
 
if (IS_GEN8(dev_priv)) {
+   dev_priv->perf.oa.timestamp_frequency = 1250;
+
dev_priv->perf.oa.ctx_oactxctrl_offset = 0x120;
dev_priv->perf.oa.ctx_flexeu0_offset = 0x2ce;
dev_priv->perf.oa.gen8_valid_ctx_bit = (1<<25);
@@ -2939,6 +2946,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
i915_oa_select_metric_set_chv;
}
} else if (IS_GEN9(dev_priv)) {
+   dev_priv->perf.oa.timestamp_frequency = 1200;
+
dev_priv->perf.oa.ctx_oactxctrl_offset = 0x128;
dev_priv->perf.oa.ctx_flexeu0_offset = 0x3de;
dev_priv->p

Re: [Intel-gfx] [PATCH v3 6/7] drm/i915/perf: per-gen timebase for checking sample freq

2017-04-05 Thread Robert Bragg
On Wed, Apr 5, 2017 at 6:26 PM, Ville Syrjälä <ville.syrj...@linux.intel.com
> wrote:

> On Wed, Apr 05, 2017 at 06:17:36PM +0100, Lionel Landwerlin wrote:
> > On 05/04/17 18:06, Ville Syrjälä wrote:
> > > On Wed, Apr 05, 2017 at 05:23:19PM +0100, Robert Bragg wrote:
> > >> An oa_exponent_to_ns() utility and per-gen timebase constants where
> > >> recently removed when updating the tail pointer race condition WA, and
> > >> this restores those so we can update the _PROP_OA_EXPONENT validation
> > >> done in read_properties_unlocked() to not assume we have a 12.5KHz
> > >> timebase as we did for Haswell.
> > >>
> > >> Signed-off-by: Robert Bragg <rob...@sixbynine.org>
> > >> Cc: Lionel Landwerlin <lionel.g.landwer...@linux.intel.com>
> > >> ---
> > >>   drivers/gpu/drm/i915/i915_drv.h  |  1 +
> > >>   drivers/gpu/drm/i915/i915_perf.c | 21 +++--
> > >>   2 files changed, 16 insertions(+), 6 deletions(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> > >> index 3a22b6fd0ee6..48b07d706f06 100644
> > >> --- a/drivers/gpu/drm/i915/i915_drv.h
> > >> +++ b/drivers/gpu/drm/i915/i915_drv.h
> > >> @@ -2463,6 +2463,7 @@ struct drm_i915_private {
> > >>
> > >>bool periodic;
> > >>int period_exponent;
> > >> +  int timestamp_frequency;
> > >>
> > >>int metrics_set;
> > >>
> > >> diff --git a/drivers/gpu/drm/i915/i915_perf.c
> b/drivers/gpu/drm/i915/i915_perf.c
> > >> index 98eb6415b63a..87c0d1ce1b9f 100644
> > >> --- a/drivers/gpu/drm/i915/i915_perf.c
> > >> +++ b/drivers/gpu/drm/i915/i915_perf.c
> > >> @@ -2549,6 +2549,12 @@ i915_perf_open_ioctl_locked(struct
> drm_i915_private *dev_priv,
> > >>return ret;
> > >>   }
> > >>
> > >> +static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int
> exponent)
> > >> +{
> > >> +   return div_u64(10ULL * (2ULL << exponent),
> > >> +  dev_priv->perf.oa.timestamp_frequency);
> > >> +}
> > >> +
> > >>   /**
> > >>* read_properties_unlocked - validate + copy userspace stream open
> properties
> > >>* @dev_priv: i915 device instance
> > >> @@ -2647,14 +2653,9 @@ static int read_properties_unlocked(struct
> drm_i915_private *dev_priv,
> > >>/* Theoretically we can program the OA unit to
> sample
> > >> * every 160ns but don't allow that by default
> unless
> > >> * root.
> > >> -   *
> > >> -   * On Haswell the period is derived from the
> exponent
> > >> -   * as:
> > >> -   *
> > >> -   *   period = 80ns * 2^(exponent + 1)
> > >> */
> > >>BUILD_BUG_ON(sizeof(oa_period) != 8);
> > >> -  oa_period = 80ull * (2ull << value);
> > >> +  oa_period = oa_exponent_to_ns(dev_priv, value);
> > >>
> > >>/* This check is primarily to ensure that
> oa_period <=
> > >> * UINT32_MAX (before passing to do_div which only
> > >> @@ -2910,6 +2911,8 @@ void i915_perf_init(struct drm_i915_private
> *dev_priv)
> > >>dev_priv->perf.oa.ops.oa_hw_tail_read =
> > >>gen7_oa_hw_tail_read;
> > >>
> > >> +  dev_priv->perf.oa.timestamp_frequency = 1250;
> > >> +
> > >>dev_priv->perf.oa.oa_formats = hsw_oa_formats;
> > >>
> > >>dev_priv->perf.oa.n_builtin_sets =
> > >> @@ -2923,6 +2926,8 @@ void i915_perf_init(struct drm_i915_private
> *dev_priv)
> > >> */
> > >>
> > >>if (IS_GEN8(dev_priv)) {
> > >> +  dev_priv->perf.oa.timestamp_frequency = 1250;
> > >> +
> > >>dev_priv->perf.oa.ctx_oactxctrl_offset = 0x120;
> > >>dev_priv->perf.oa.ctx_flexeu0_offset = 0x2ce;
> > >>dev_priv->perf.oa.gen8_valid_ctx

[Intel-gfx] [PATCH v3 6/7] drm/i915/perf: per-gen timebase for checking sample freq

2017-04-05 Thread Robert Bragg
An oa_exponent_to_ns() utility and per-gen timebase constants where
recently removed when updating the tail pointer race condition WA, and
this restores those so we can update the _PROP_OA_EXPONENT validation
done in read_properties_unlocked() to not assume we have a 12.5KHz
timebase as we did for Haswell.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Cc: Lionel Landwerlin <lionel.g.landwer...@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/i915_perf.c | 21 +++--
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3a22b6fd0ee6..48b07d706f06 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2463,6 +2463,7 @@ struct drm_i915_private {
 
bool periodic;
int period_exponent;
+   int timestamp_frequency;
 
int metrics_set;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 98eb6415b63a..87c0d1ce1b9f 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -2549,6 +2549,12 @@ i915_perf_open_ioctl_locked(struct drm_i915_private 
*dev_priv,
return ret;
 }
 
+static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int exponent)
+{
+   return div_u64(10ULL * (2ULL << exponent),
+  dev_priv->perf.oa.timestamp_frequency);
+}
+
 /**
  * read_properties_unlocked - validate + copy userspace stream open properties
  * @dev_priv: i915 device instance
@@ -2647,14 +2653,9 @@ static int read_properties_unlocked(struct 
drm_i915_private *dev_priv,
/* Theoretically we can program the OA unit to sample
 * every 160ns but don't allow that by default unless
 * root.
-*
-* On Haswell the period is derived from the exponent
-* as:
-*
-*   period = 80ns * 2^(exponent + 1)
 */
BUILD_BUG_ON(sizeof(oa_period) != 8);
-   oa_period = 80ull * (2ull << value);
+   oa_period = oa_exponent_to_ns(dev_priv, value);
 
/* This check is primarily to ensure that oa_period <=
 * UINT32_MAX (before passing to do_div which only
@@ -2910,6 +2911,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
dev_priv->perf.oa.ops.oa_hw_tail_read =
gen7_oa_hw_tail_read;
 
+   dev_priv->perf.oa.timestamp_frequency = 1250;
+
dev_priv->perf.oa.oa_formats = hsw_oa_formats;
 
dev_priv->perf.oa.n_builtin_sets =
@@ -2923,6 +2926,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 */
 
if (IS_GEN8(dev_priv)) {
+   dev_priv->perf.oa.timestamp_frequency = 1250;
+
dev_priv->perf.oa.ctx_oactxctrl_offset = 0x120;
dev_priv->perf.oa.ctx_flexeu0_offset = 0x2ce;
dev_priv->perf.oa.gen8_valid_ctx_bit = (1<<25);
@@ -2939,6 +2944,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
i915_oa_select_metric_set_chv;
}
} else if (IS_GEN9(dev_priv)) {
+   dev_priv->perf.oa.timestamp_frequency = 1200;
+
dev_priv->perf.oa.ctx_oactxctrl_offset = 0x128;
dev_priv->perf.oa.ctx_flexeu0_offset = 0x3de;
dev_priv->perf.oa.gen8_valid_ctx_bit = (1<<16);
@@ -2959,6 +2966,8 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
dev_priv->perf.oa.ops.select_metric_set =
i915_oa_select_metric_set_sklgt4;
} else if (IS_BROXTON(dev_priv)) {
+   dev_priv->perf.oa.timestamp_frequency = 
19200123;
+
dev_priv->perf.oa.n_builtin_sets =
i915_oa_n_builtin_metric_sets_bxt;
dev_priv->perf.oa.ops.select_metric_set =
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v3 4/7] drm/i915/perf: Add OA unit support for Gen 8+

2017-04-05 Thread Robert Bragg
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.

Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.

The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).

This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.

The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).

Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.h |  45 +-
 drivers/gpu/drm/i915/i915_gem_context.h |   1 +
 drivers/gpu/drm/i915/i915_perf.c| 938 +---
 drivers/gpu/drm/i915/i915_reg.h |  22 +
 drivers/gpu/drm/i915/intel_lrc.c|   5 +
 include/uapi/drm/i915_drm.h |  19 +-
 6 files changed, 937 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9c37b73ac7ac..3a22b6fd0ee6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2067,9 +2067,17 @@ struct i915_oa_ops {
void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
 
/**
-* @enable_metric_set: Applies any MUX configuration to set up the
-* Boolean and Custom (B/C) counters that are part of the counter
-* reports being sampled. May apply system constraints such as
+* @select_metric_set: The auto generated code that checks whether a
+* requested OA config is applicable to the system and if so sets up
+* the mux, oa and flex eu register config pointers according to the
+* current dev_priv->perf.oa.metrics_set.
+*/
+   int (*select_metric_set)(struct drm_i915_private *dev_priv);
+
+   /**
+* @enable_metric_set: Selects and applies any MUX configuration to set
+* up the Boolean and Custom (B/C) counters that are part of the
+* counter reports being sampled. May apply system constraints such as
 * disabling EU clock gating as required.
 */
int (*enable_metric_set)(struct drm_i915_private *dev_priv);
@@ -2100,20 +2108,13 @@ struct i915_oa_ops {
size_t *offset);
 
/**
-* @oa_buffer_check: Check for OA buffer data + update tail
-*
-* This is either called via fops or the poll check hrtimer (atomic
-* ctx) without any locks taken.
+* @oa_hw_tail_read: read the OA tail pointer register
 *
-* It's safe to read OA config state here unlocked, assuming that this
-* is only called while the stream is enabled, while the global OA
-* configuration can't be modified.
-*
-* Efficiency is more important than avoiding some false positives
-* here, which will be handled gracefully

[Intel-gfx] [PATCH v3 1/7] drm/i915: expose _SLICE_MASK GETPARM

2017-04-05 Thread Robert Bragg
Enables userspace to determine the number of slices enabled and also
know what specific slices are enabled. This information is required, for
example, to be able to analyse some OA counter reports where the counter
configuration depends on the HW slice configuration.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 5 +
 include/uapi/drm/i915_drm.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 5852eed2a867..337acf034d36 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -357,6 +357,11 @@ static int i915_getparam(struct drm_device *dev, void 
*data,
 */
value = 1;
break;
+   case I915_PARAM_SLICE_MASK:
+   value = INTEL_INFO(dev_priv)->sseu.slice_mask;
+   if (!value)
+   return -ENODEV;
+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 3554495bef13..f47fb7f26f36 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -392,6 +392,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_POOLED_EU38
 #define I915_PARAM_MIN_EU_IN_POOL   39
 #define I915_PARAM_MMAP_GTT_VERSION 40
+#define I915_PARAM_SLICE_MASK   45 /* XXX: rebase before landing */
 
 /* Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution
  * priorities and the driver will attempt to execute batches in priority order.
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v3 7/7] drm/i915/perf: remove perf.hook_lock

2017-04-05 Thread Robert Bragg
In earlier iterations of the i915-perf driver we had a number of
callbacks/hooks from other parts of the i915 driver to e.g. notify us
when a legacy context was pinned and these could run asynchronously with
respect to the stream file operations and might also run in atomic
context.

dev_priv->perf.hook_lock had been for serialising access to state needed
within these callbacks, but as the code has evolved some of the hooks
have gone away or are implemented to avoid needing to lock any state.

The remaining use of this lock was actually redundant considering how
the gen7 oacontrol state used to be updated as part of a context pin
hook.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.h  |  2 --
 drivers/gpu/drm/i915/i915_perf.c | 32 ++--
 2 files changed, 10 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 48b07d706f06..67ac4e6dbccb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2444,8 +2444,6 @@ struct drm_i915_private {
struct mutex lock;
struct list_head streams;
 
-   spinlock_t hook_lock;
-
struct {
struct i915_perf_stream *exclusive_stream;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 87c0d1ce1b9f..63a1152766f8 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1677,9 +1677,17 @@ static void gen8_disable_metric_set(struct 
drm_i915_private *dev_priv)
/* NOP */
 }
 
-static void gen7_update_oacontrol_locked(struct drm_i915_private *dev_priv)
+static void gen7_oa_enable(struct drm_i915_private *dev_priv)
 {
-   lockdep_assert_held(_priv->perf.hook_lock);
+   /* Reset buf pointers so we don't forward reports from before now.
+*
+* Think carefully if considering trying to avoid this, since it
+* also ensures status flags and the buffer itself are cleared
+* in error paths, and we have checks for invalid reports based
+* on the assumption that certain fields are written to zeroed
+* memory which this helps maintains.
+*/
+   gen7_init_oa_buffer(dev_priv);
 
if (dev_priv->perf.oa.exclusive_stream->enabled) {
struct i915_gem_context *ctx =
@@ -1702,25 +1710,6 @@ static void gen7_update_oacontrol_locked(struct 
drm_i915_private *dev_priv)
I915_WRITE(GEN7_OACONTROL, 0);
 }
 
-static void gen7_oa_enable(struct drm_i915_private *dev_priv)
-{
-   unsigned long flags;
-
-   /* Reset buf pointers so we don't forward reports from before now.
-*
-* Think carefully if considering trying to avoid this, since it
-* also ensures status flags and the buffer itself are cleared
-* in error paths, and we have checks for invalid reports based
-* on the assumption that certain fields are written to zeroed
-* memory which this helps maintains.
-*/
-   gen7_init_oa_buffer(dev_priv);
-
-   spin_lock_irqsave(_priv->perf.hook_lock, flags);
-   gen7_update_oacontrol_locked(dev_priv);
-   spin_unlock_irqrestore(_priv->perf.hook_lock, flags);
-}
-
 static void gen8_oa_enable(struct drm_i915_private *dev_priv)
 {
u32 report_format = dev_priv->perf.oa.oa_buffer.format;
@@ -2999,7 +2988,6 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 
INIT_LIST_HEAD(_priv->perf.streams);
mutex_init(_priv->perf.lock);
-   spin_lock_init(_priv->perf.hook_lock);
spin_lock_init(_priv->perf.oa.oa_buffer.ptr_lock);
 
dev_priv->perf.sysctl_header = register_sysctl_table(dev_root);
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v3 2/7] drm/i915: expose _SUBSLICE_MASK GETPARM

2017-04-05 Thread Robert Bragg
Assuming a uniform mask across all slices, this enables userspace to
determine the specific sub slices enabled. This information is required,
for example, to be able to analyse some OA counter reports where the
counter configuration depends on the HW sub slice configuration.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 5 +
 include/uapi/drm/i915_drm.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 337acf034d36..e4ed70d21e91 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -362,6 +362,11 @@ static int i915_getparam(struct drm_device *dev, void 
*data,
if (!value)
return -ENODEV;
break;
+   case I915_PARAM_SUBSLICE_MASK:
+   value = INTEL_INFO(dev_priv)->sseu.subslice_mask;
+   if (!value)
+   return -ENODEV;
+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index f47fb7f26f36..e0599e729e68 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -393,6 +393,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_MIN_EU_IN_POOL   39
 #define I915_PARAM_MMAP_GTT_VERSION 40
 #define I915_PARAM_SLICE_MASK   45 /* XXX: rebase before landing */
+#define I915_PARAM_SUBSLICE_MASK46
 
 /* Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution
  * priorities and the driver will attempt to execute batches in priority order.
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v3 3/7] drm/i915/perf: Add 'render basic' Gen8+ OA unit configs

2017-04-05 Thread Robert Bragg
Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic
render metrics on Broadwell, Cherryview, Skylake and Broxton. These are
auto generated from an XML description of metric sets, currently
maintained in gputop, ref:

 https://github.com/rib/gputop
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml WHITELIST=RenderBasic

v2: add newlines to debug messages + fix comment (Matthew Auld)

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/Makefile |   8 +-
 drivers/gpu/drm/i915/i915_drv.h   |   2 +
 drivers/gpu/drm/i915/i915_oa_bdw.c| 380 ++
 drivers/gpu/drm/i915/i915_oa_bdw.h|  38 
 drivers/gpu/drm/i915/i915_oa_bxt.c| 238 +
 drivers/gpu/drm/i915/i915_oa_bxt.h|  38 
 drivers/gpu/drm/i915/i915_oa_chv.c| 225 
 drivers/gpu/drm/i915/i915_oa_chv.h|  38 
 drivers/gpu/drm/i915/i915_oa_sklgt2.c | 228 
 drivers/gpu/drm/i915/i915_oa_sklgt2.h |  38 
 drivers/gpu/drm/i915/i915_oa_sklgt3.c | 236 +
 drivers/gpu/drm/i915/i915_oa_sklgt3.h |  38 
 drivers/gpu/drm/i915/i915_oa_sklgt4.c | 247 ++
 drivers/gpu/drm/i915/i915_oa_sklgt4.h |  38 
 14 files changed, 1791 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 2cf04504e494..41400a138a1e 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -127,7 +127,13 @@ i915-y += i915_vgpu.o
 
 # perf code
 i915-y += i915_perf.o \
- i915_oa_hsw.o
+ i915_oa_hsw.o \
+ i915_oa_bdw.o \
+ i915_oa_chv.o \
+ i915_oa_sklgt2.o \
+ i915_oa_sklgt3.o \
+ i915_oa_sklgt4.o \
+ i915_oa_bxt.o
 
 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 51bd6c6034bb..9c37b73ac7ac 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2469,6 +2469,8 @@ struct drm_i915_private {
int mux_regs_len;
const struct i915_oa_reg *b_counter_regs;
int b_counter_regs_len;
+   const struct i915_oa_reg *flex_regs;
+   int flex_regs_len;
 
struct {
struct i915_vma *vma;
diff --git a/drivers/gpu/drm/i915/i915_oa_bdw.c 
b/drivers/gpu/drm/i915/i915_oa_bdw.c
new file mode 100644
index ..b0b1b75fb431
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_bdw.c
@@ -0,0 +1,380 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include 
+
+#include "i915_drv.h"
+#include "i915_oa_bdw.h"
+
+enum metric_set_id {
+   METRIC_SET_ID_RENDER_BASIC = 1,
+};
+
+int i915_oa_n_builtin_metric_sets_bdw = 1;
+
+static const struct i915_oa_reg b_counter_config_render_basic[] = {
+  

[Intel-gfx] [PATCH v3 0/7] Enable OA unit for Gen 8 and 9 in i915 perf

2017-04-05 Thread Robert Bragg
Adds some R/Bs from from Matthew and some updates based on Matthew's feedback

Notably the 'Add OA unit support for Gen 8+' patch now avoids duplicating lots
of fiddly tail race workaround code by adding a vfunc for reading the OA tail
pointer register.

Robert Bragg (7):
  drm/i915: expose _SLICE_MASK GETPARM
  drm/i915: expose _SUBSLICE_MASK GETPARM
  drm/i915/perf: Add 'render basic' Gen8+ OA unit configs
  drm/i915/perf: Add OA unit support for Gen 8+
  drm/i915/perf: Add more OA configs for BDW, CHV, SKL + BXT
  drm/i915/perf: per-gen timebase for checking sample freq
  drm/i915/perf: remove perf.hook_lock

 drivers/gpu/drm/i915/Makefile   |8 +-
 drivers/gpu/drm/i915/i915_drv.c |   10 +
 drivers/gpu/drm/i915/i915_drv.h |   50 +-
 drivers/gpu/drm/i915/i915_gem_context.h |1 +
 drivers/gpu/drm/i915/i915_oa_bdw.c  | 5154 +++
 drivers/gpu/drm/i915/i915_oa_bdw.h  |   38 +
 drivers/gpu/drm/i915/i915_oa_bxt.c  | 2541 +++
 drivers/gpu/drm/i915/i915_oa_bxt.h  |   38 +
 drivers/gpu/drm/i915/i915_oa_chv.c  | 2730 
 drivers/gpu/drm/i915/i915_oa_chv.h  |   38 +
 drivers/gpu/drm/i915/i915_oa_hsw.c  |   58 +-
 drivers/gpu/drm/i915/i915_oa_sklgt2.c   | 3303 
 drivers/gpu/drm/i915/i915_oa_sklgt2.h   |   38 +
 drivers/gpu/drm/i915/i915_oa_sklgt3.c   | 2856 +
 drivers/gpu/drm/i915/i915_oa_sklgt3.h   |   38 +
 drivers/gpu/drm/i915/i915_oa_sklgt4.c   | 2910 +
 drivers/gpu/drm/i915/i915_oa_sklgt4.h   |   38 +
 drivers/gpu/drm/i915/i915_perf.c|  963 +-
 drivers/gpu/drm/i915/i915_reg.h |   22 +
 drivers/gpu/drm/i915/intel_lrc.c|5 +
 include/uapi/drm/i915_drm.h |   21 +-
 21 files changed, 20745 insertions(+), 115 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.h

-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v3] drm/i915/perf: rate limit spurious oa report notice

2017-04-05 Thread Robert Bragg
Instead of initializing and summarising the number of throttled messages in
the driver _init / _fini we now do this when opening / closing an OA stream.

--- >8 ---

This change is pre-emptively aiming to avoid a potential cause of kernel
logging noise in case some condition were to result in us seeing invalid
OA reports.

The workaround for the OA unit's tail pointer race condition is what
avoids the primary known cause of invalid reports being seen and with
that in place we aren't expecting to see this notice but it can't be
entirely ruled out.

Just in case some condition does lead to the notice then it's likely
that it will be triggered repeatedly while attempting to append a
sequence of reports and depending on the configured OA sampling
frequency that might be a large number of repeat notices.

v2: (Chris) avoid inconsistent warning on throttle with
printk_ratelimit()
v3: (Matt) init and summarise with stream init/close not driver init/fini

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.h  |  6 ++
 drivers/gpu/drm/i915/i915_perf.c | 28 +++-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 51a410911d81..51bd6c6034bb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2454,6 +2454,12 @@ struct drm_i915_private {
wait_queue_head_t poll_wq;
bool pollin;
 
+   /**
+* For rate limiting any notifications of spurious
+* invalid OA reports
+*/
+   struct ratelimit_state spurious_report_rs;
+
bool periodic;
int period_exponent;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 5738b99caa5b..3277a52ce98e 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -632,7 +632,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
 * copying it to userspace...
 */
if (report32[0] == 0) {
-   DRM_NOTE("Skipping spurious, invalid OA report\n");
+   if (__ratelimit(_priv->perf.oa.spurious_report_rs))
+   DRM_NOTE("Skipping spurious, invalid OA 
report\n");
continue;
}
 
@@ -913,6 +914,11 @@ static void i915_oa_stream_destroy(struct i915_perf_stream 
*stream)
oa_put_render_ctx_id(stream);
 
dev_priv->perf.oa.exclusive_stream = NULL;
+
+   if (dev_priv->perf.oa.spurious_report_rs.missed) {
+   DRM_NOTE("%d spurious OA report notices suppressed due to 
ratelimiting\n",
+dev_priv->perf.oa.spurious_report_rs.missed);
+   }
 }
 
 static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv)
@@ -1268,6 +1274,26 @@ static int i915_oa_stream_init(struct i915_perf_stream 
*stream,
return -EINVAL;
}
 
+   /* We set up some ratelimit state to potentially throttle any _NOTES
+* about spurious, invalid OA reports which we don't forward to
+* userspace.
+*
+* The initialization is associated with opening the stream (not driver
+* init) considering we print a _NOTE about any throttling when closing
+* the stream instead of waiting until driver _fini which no one would
+* ever see.
+*
+* Using the same limiting factors as printk_ratelimit()
+*/
+   ratelimit_state_init(_priv->perf.oa.spurious_report_rs,
+5 * HZ, 10);
+   /* Since we use a DRM_NOTE for spurious reports it would be
+* inconsistent to let __ratelimit() automatically print a warning for
+* throttling.
+*/
+   ratelimit_set_flags(_priv->perf.oa.spurious_report_rs,
+   RATELIMIT_MSG_ON_RELEASE);
+
stream->sample_size = sizeof(struct drm_i915_perf_record_header);
 
format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size;
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 5/5] drm/i915: Add more OA configs for BDW, CHV, SKL + BXT

2017-04-05 Thread Robert Bragg
On Mon, Mar 27, 2017 at 7:16 PM, Matthew Auld <
matthew.william.a...@gmail.com> wrote:

> On 03/23, Robert Bragg wrote:
> > These are auto generated from an XML description of metric sets,
> > currently maintained in gputop, ref:
> >
> >  https://github.com/rib/gputop
> >  > gputop-data/oa-*.xml
> >  > scripts/i915-perf-kernelgen.py
> >
> >  $ make -C gputop-data -f Makefile.xml
> >
> > Signed-off-by: Robert Bragg <rob...@sixbynine.org>
> > ---
>
> 
>
> >  int i915_oa_select_metric_set_bdw(struct drm_i915_private *dev_priv)
> >  {
> > - dev_priv->perf.oa.mux_regs = NULL;
> > - dev_priv->perf.oa.mux_regs_len = 0;
> > - dev_priv->perf.oa.b_counter_regs = NULL;
> > - dev_priv->perf.oa.b_counter_regs_len = 0;
> > - dev_priv->perf.oa.flex_regs = NULL;
> > - dev_priv->perf.oa.flex_regs_len = 0;
> > + dev_priv->perf.oa.mux_regs = NULL;
> > + dev_priv->perf.oa.mux_regs_len = 0;
> > + dev_priv->perf.oa.b_counter_regs = NULL;
> > + dev_priv->perf.oa.b_counter_regs_len = 0;
> > + dev_priv->perf.oa.flex_regs = NULL;
> > + dev_priv->perf.oa.flex_regs_len = 0;
> What changed? I can't tell from the diff...
>

I don't think anything changed in those lines, it's just that the diff uses
the start of the function for context and then has to delete these to add
the full replacement for the function body which included substantial
changes to add the cases for the additional configs.


>
> Otherwise assuming you re-spin with the DRM_DEBUG changes:
> Reviewed-by: Matthew Auld <matthew.a...@intel.com>
>

thanks
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/perf: remove user triggerable warn

2017-03-28 Thread Robert Bragg
On Mon, Mar 27, 2017 at 9:32 PM, Matthew Auld <matthew.a...@intel.com>
wrote:

> Don't throw a warning if we are given an invalid property id. While
> here let's also bring back Robert' original idea of catching unhandled
> enumeration values at compile time.
>
> Fixes: eec688e1420d ("drm/i915: Add i915 perf infrastructure")
> Signed-off-by: Matthew Auld <matthew.a...@intel.com>
> Cc: Robert Bragg <rob...@sixbynine.org>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_
> perf.c
> index 8c121187ff39..e52bc6a581e6 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1793,6 +1793,11 @@ static int read_properties_unlocked(struct
> drm_i915_private *dev_priv,
> if (ret)
> return ret;
>
> +   if (id == 0 || id >= DRM_I915_PERF_PROP_MAX) {
> +   DRM_DEBUG("Unknown i915 perf property ID\n");
> +   return -EINVAL;
> +   }
> +
> switch ((enum drm_i915_perf_property_id)id) {
> case DRM_I915_PERF_PROP_CTX_HANDLE:
> props->single_context = 1;
> @@ -1862,9 +1867,8 @@ static int read_properties_unlocked(struct
> drm_i915_private *dev_priv,
> props->oa_periodic = true;
> props->oa_period_exponent = value;
> break;
> -   default:
> +   case DRM_I915_PERF_PROP_MAX:
> MISSING_CASE(id);
> -   DRM_DEBUG("Unknown i915 perf property ID\n");
> return -EINVAL;
> }
>
> --
> 2.9.3
>
>
Looks good to me, thanks.

Reviewed-by: Robert Bragg <rob...@sixbynine.org>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2] drm/i915: Add 'render basic' Gen8+ OA unit configs

2017-03-23 Thread Robert Bragg
Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic
render metrics on Broadwell, Cherryview, Skylake and Broxton. These are
auto generated from an XML description of metric sets, currently
maintained in gputop, ref:

 https://github.com/rib/gputop
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml WHITELIST=RenderBasic

v2: add newlines to debug messages + fix comment (Matthew Auld)

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/Makefile |   8 +-
 drivers/gpu/drm/i915/i915_drv.h   |   2 +
 drivers/gpu/drm/i915/i915_oa_bdw.c| 380 ++
 drivers/gpu/drm/i915/i915_oa_bdw.h|  38 
 drivers/gpu/drm/i915/i915_oa_bxt.c| 238 +
 drivers/gpu/drm/i915/i915_oa_bxt.h|  38 
 drivers/gpu/drm/i915/i915_oa_chv.c| 225 
 drivers/gpu/drm/i915/i915_oa_chv.h|  38 
 drivers/gpu/drm/i915/i915_oa_sklgt2.c | 228 
 drivers/gpu/drm/i915/i915_oa_sklgt2.h |  38 
 drivers/gpu/drm/i915/i915_oa_sklgt3.c | 236 +
 drivers/gpu/drm/i915/i915_oa_sklgt3.h |  38 
 drivers/gpu/drm/i915/i915_oa_sklgt4.c | 247 ++
 drivers/gpu/drm/i915/i915_oa_sklgt4.h |  38 
 14 files changed, 1791 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 2cf04504e494..41400a138a1e 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -127,7 +127,13 @@ i915-y += i915_vgpu.o
 
 # perf code
 i915-y += i915_perf.o \
- i915_oa_hsw.o
+ i915_oa_hsw.o \
+ i915_oa_bdw.o \
+ i915_oa_chv.o \
+ i915_oa_sklgt2.o \
+ i915_oa_sklgt3.o \
+ i915_oa_sklgt4.o \
+ i915_oa_bxt.o
 
 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a7986c0c29ad..c4156a8a5dc0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2486,6 +2486,8 @@ struct drm_i915_private {
int mux_regs_len;
const struct i915_oa_reg *b_counter_regs;
int b_counter_regs_len;
+   const struct i915_oa_reg *flex_regs;
+   int flex_regs_len;
 
struct {
struct i915_vma *vma;
diff --git a/drivers/gpu/drm/i915/i915_oa_bdw.c 
b/drivers/gpu/drm/i915/i915_oa_bdw.c
new file mode 100644
index ..b0b1b75fb431
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_bdw.c
@@ -0,0 +1,380 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include 
+
+#include "i915_drv.h"
+#include "i915_oa_bdw.h"
+
+enum metric_set_id {
+   METRIC_SET_ID_RENDER_BASIC = 1,
+};
+
+int i915_oa_n_builtin_metric_sets_bdw = 1;
+
+static const struct i915_oa_reg b_counter_config_render_basic[] = {
+  

Re: [Intel-gfx] [PATCH v2 3/5] drm/i915: Add 'render basic' Gen8+ OA unit configs

2017-03-23 Thread Robert Bragg
On Fri, Mar 24, 2017 at 12:52 AM, Robert Bragg <rob...@sixbynine.org> wrote:
> On Thu, Mar 23, 2017 at 8:48 PM, Matthew Auld
> <matthew.william.a...@gmail.com> wrote:
>> On 23 March 2017 at 20:18, Robert Bragg <rob...@sixbynine.org> wrote:
>>> Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic
>>> render metrics on Broadwell, Cherryview, Skylake and Broxton. These are
>>> auto generated from an XML description of metric sets, currently
>>> maintained in gputop, ref:
>>>
>>>  https://github.com/rib/gputop
>>>  > gputop-data/oa-*.xml
>>>  > scripts/i915-perf-kernelgen.py
>>>
>>>  $ make -C gputop-data -f Makefile.xml WHITELIST=RenderBasic
>>>
>>> Signed-off-by: Robert Bragg <rob...@sixbynine.org>
>>
>> 
>>
>>> +
>>> +int i915_oa_select_metric_set_bdw(struct drm_i915_private *dev_priv)
>>> +{
>>> +   dev_priv->perf.oa.mux_regs = NULL;
>>> +   dev_priv->perf.oa.mux_regs_len = 0;
>>> +   dev_priv->perf.oa.b_counter_regs = NULL;
>>> +   dev_priv->perf.oa.b_counter_regs_len = 0;
>>> +   dev_priv->perf.oa.flex_regs = NULL;
>>> +   dev_priv->perf.oa.flex_regs_len = 0;
>>> +
>>> +   switch (dev_priv->perf.oa.metrics_set) {
>>> +   case METRIC_SET_ID_RENDER_BASIC:
>>> +   dev_priv->perf.oa.mux_regs =
>>> +   get_render_basic_mux_config(dev_priv,
>>> +   
>>> _priv->perf.oa.mux_regs_len);
>>> +   if (!dev_priv->perf.oa.mux_regs) {
>>> +   DRM_DEBUG_DRIVER("No suitable MUX config for 
>>> \"RENDER_BASIC\" metric set");
>> You forgot to update your script ;)
>
> Hmm, that's odd; I didn't:
> https://github.com/rib/gputop/commit/95bc05957d488e2004ef2e7b5ba5b33d7dd559dd
>
> So need to double check what's up here.

*facepalm* - must have been on autopilot, sorry will fix.

>
>>
>>> +
>>> +   /* EINVAL because *_register_sysfs already checked 
>>> this
>>> +* and so it wouldn't have been advertised so 
>>> userspace and
>>> +* so shouldn't have been requested
>> s/so userspace/to userspace/ ?
>
> Ah yup.
>
>>
>> Otherwise assuming the configs are indeed correct:
>> Reviewed-by: Matthew Auld <matthew.a...@intel.com>
>
> Thanks
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 3/5] drm/i915: Add 'render basic' Gen8+ OA unit configs

2017-03-23 Thread Robert Bragg
On Thu, Mar 23, 2017 at 8:48 PM, Matthew Auld
<matthew.william.a...@gmail.com> wrote:
> On 23 March 2017 at 20:18, Robert Bragg <rob...@sixbynine.org> wrote:
>> Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic
>> render metrics on Broadwell, Cherryview, Skylake and Broxton. These are
>> auto generated from an XML description of metric sets, currently
>> maintained in gputop, ref:
>>
>>  https://github.com/rib/gputop
>>  > gputop-data/oa-*.xml
>>  > scripts/i915-perf-kernelgen.py
>>
>>  $ make -C gputop-data -f Makefile.xml WHITELIST=RenderBasic
>>
>> Signed-off-by: Robert Bragg <rob...@sixbynine.org>
>
> 
>
>> +
>> +int i915_oa_select_metric_set_bdw(struct drm_i915_private *dev_priv)
>> +{
>> +   dev_priv->perf.oa.mux_regs = NULL;
>> +   dev_priv->perf.oa.mux_regs_len = 0;
>> +   dev_priv->perf.oa.b_counter_regs = NULL;
>> +   dev_priv->perf.oa.b_counter_regs_len = 0;
>> +   dev_priv->perf.oa.flex_regs = NULL;
>> +   dev_priv->perf.oa.flex_regs_len = 0;
>> +
>> +   switch (dev_priv->perf.oa.metrics_set) {
>> +   case METRIC_SET_ID_RENDER_BASIC:
>> +   dev_priv->perf.oa.mux_regs =
>> +   get_render_basic_mux_config(dev_priv,
>> +   
>> _priv->perf.oa.mux_regs_len);
>> +   if (!dev_priv->perf.oa.mux_regs) {
>> +   DRM_DEBUG_DRIVER("No suitable MUX config for 
>> \"RENDER_BASIC\" metric set");
> You forgot to update your script ;)

Hmm, that's odd; I didn't:
https://github.com/rib/gputop/commit/95bc05957d488e2004ef2e7b5ba5b33d7dd559dd

So need to double check what's up here.

>
>> +
>> +   /* EINVAL because *_register_sysfs already checked 
>> this
>> +* and so it wouldn't have been advertised so 
>> userspace and
>> +* so shouldn't have been requested
> s/so userspace/to userspace/ ?

Ah yup.

>
> Otherwise assuming the configs are indeed correct:
> Reviewed-by: Matthew Auld <matthew.a...@intel.com>

Thanks
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 4/5] drm/i915: Add OA unit support for Gen 8+

2017-03-23 Thread Robert Bragg
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.

Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.

The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).

This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.

The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).

Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.h |   29 +-
 drivers/gpu/drm/i915/i915_gem_context.h |1 +
 drivers/gpu/drm/i915/i915_perf.c| 1034 ---
 drivers/gpu/drm/i915/i915_reg.h |   22 +
 drivers/gpu/drm/i915/intel_lrc.c|5 +
 include/uapi/drm/i915_drm.h |   19 +-
 6 files changed, 1029 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c4156a8a5dc0..190e699d5851 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -731,6 +731,7 @@ intel_uncore_forcewake_for_reg(struct drm_i915_private 
*dev_priv,
   i915_reg_t reg, unsigned int op);
 
 struct intel_uncore_funcs {
+   int (*wait_for_rcs_busy)(struct drm_i915_private *dev_priv);
void (*force_wake_get)(struct drm_i915_private *dev_priv,
   enum forcewake_domains domains);
void (*force_wake_put)(struct drm_i915_private *dev_priv,
@@ -2084,9 +2085,17 @@ struct i915_oa_ops {
void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
 
/**
-* @enable_metric_set: Applies any MUX configuration to set up the
-* Boolean and Custom (B/C) counters that are part of the counter
-* reports being sampled. May apply system constraints such as
+* @select_metric_set: The auto generated code that checks whether a
+* requested OA config is applicable to the system and if so sets up
+* the mux, oa and flex eu register config pointers according to the
+* current dev_priv->perf.oa.metrics_set.
+*/
+   int (*select_metric_set)(struct drm_i915_private *dev_priv);
+
+   /**
+* @enable_metric_set: Selects and applies any MUX configuration to set
+* up the Boolean and Custom (B/C) counters that are part of the
+* counter reports being sampled. May apply system constraints such as
 * disabling EU clock gating as required.
 */
int (*enable_metric_set)(struct drm_i915_private *dev_priv);
@@ -2492,6 +2501,7 @@ struct drm_i915_private {
struct {
struct i915_vma *vma;
u8 *vaddr;
+   u32 la

[Intel-gfx] [PATCH v2 3/5] drm/i915: Add 'render basic' Gen8+ OA unit configs

2017-03-23 Thread Robert Bragg
Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic
render metrics on Broadwell, Cherryview, Skylake and Broxton. These are
auto generated from an XML description of metric sets, currently
maintained in gputop, ref:

 https://github.com/rib/gputop
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml WHITELIST=RenderBasic

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/Makefile |   8 +-
 drivers/gpu/drm/i915/i915_drv.h   |   2 +
 drivers/gpu/drm/i915/i915_oa_bdw.c| 380 ++
 drivers/gpu/drm/i915/i915_oa_bdw.h|  38 
 drivers/gpu/drm/i915/i915_oa_bxt.c| 238 +
 drivers/gpu/drm/i915/i915_oa_bxt.h|  38 
 drivers/gpu/drm/i915/i915_oa_chv.c| 225 
 drivers/gpu/drm/i915/i915_oa_chv.h|  38 
 drivers/gpu/drm/i915/i915_oa_sklgt2.c | 228 
 drivers/gpu/drm/i915/i915_oa_sklgt2.h |  38 
 drivers/gpu/drm/i915/i915_oa_sklgt3.c | 236 +
 drivers/gpu/drm/i915/i915_oa_sklgt3.h |  38 
 drivers/gpu/drm/i915/i915_oa_sklgt4.c | 247 ++
 drivers/gpu/drm/i915/i915_oa_sklgt4.h |  38 
 14 files changed, 1791 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 2cf04504e494..41400a138a1e 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -127,7 +127,13 @@ i915-y += i915_vgpu.o
 
 # perf code
 i915-y += i915_perf.o \
- i915_oa_hsw.o
+ i915_oa_hsw.o \
+ i915_oa_bdw.o \
+ i915_oa_chv.o \
+ i915_oa_sklgt2.o \
+ i915_oa_sklgt3.o \
+ i915_oa_sklgt4.o \
+ i915_oa_bxt.o
 
 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a7986c0c29ad..c4156a8a5dc0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2486,6 +2486,8 @@ struct drm_i915_private {
int mux_regs_len;
const struct i915_oa_reg *b_counter_regs;
int b_counter_regs_len;
+   const struct i915_oa_reg *flex_regs;
+   int flex_regs_len;
 
struct {
struct i915_vma *vma;
diff --git a/drivers/gpu/drm/i915/i915_oa_bdw.c 
b/drivers/gpu/drm/i915/i915_oa_bdw.c
new file mode 100644
index ..539484aef6b5
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_bdw.c
@@ -0,0 +1,380 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include 
+
+#include "i915_drv.h"
+#include "i915_oa_bdw.h"
+
+enum metric_set_id {
+   METRIC_SET_ID_RENDER_BASIC = 1,
+};
+
+int i915_oa_n_builtin_metric_sets_bdw = 1;
+
+static const struct i915_oa_reg b_counter_config_render_basic[] = {
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2714), 0x0080 },
+   { _MMIO(0x2720), 0x },
+   {

[Intel-gfx] [PATCH v2 2/5] drm/i915: expose _SUBSLICE_MASK GETPARM

2017-03-23 Thread Robert Bragg
Assuming a uniform mask across all slices, this enables userspace to
determine the specific sub slices enabled. This information is required,
for example, to be able to analyse some OA counter reports where the
counter configuration depends on the HW sub slice configuration.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.c | 5 +
 include/uapi/drm/i915_drm.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index ad12ffc356be..c39a03c1f78d 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -362,6 +362,11 @@ static int i915_getparam(struct drm_device *dev, void 
*data,
if (!value)
return -ENODEV;
break;
+   case I915_PARAM_SUBSLICE_MASK:
+   value = INTEL_INFO(dev_priv)->sseu.subslice_mask;
+   if (!value)
+   return -ENODEV;
+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index f47fb7f26f36..e0599e729e68 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -393,6 +393,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_MIN_EU_IN_POOL   39
 #define I915_PARAM_MMAP_GTT_VERSION 40
 #define I915_PARAM_SLICE_MASK   45 /* XXX: rebase before landing */
+#define I915_PARAM_SUBSLICE_MASK46
 
 /* Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution
  * priorities and the driver will attempt to execute batches in priority order.
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 1/5] drm/i915: expose _SLICE_MASK GETPARM

2017-03-23 Thread Robert Bragg
Enables userspace to determine the number of slices enabled and also
know what specific slices are enabled. This information is required, for
example, to be able to analyse some OA counter reports where the counter
configuration depends on the HW slice configuration.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.c | 5 +
 include/uapi/drm/i915_drm.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 6d9944a00b7d..ad12ffc356be 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -357,6 +357,11 @@ static int i915_getparam(struct drm_device *dev, void 
*data,
 */
value = 1;
break;
+   case I915_PARAM_SLICE_MASK:
+   value = INTEL_INFO(dev_priv)->sseu.slice_mask;
+   if (!value)
+   return -ENODEV;
+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 3554495bef13..f47fb7f26f36 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -392,6 +392,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_POOLED_EU38
 #define I915_PARAM_MIN_EU_IN_POOL   39
 #define I915_PARAM_MMAP_GTT_VERSION 40
+#define I915_PARAM_SLICE_MASK   45 /* XXX: rebase before landing */
 
 /* Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution
  * priorities and the driver will attempt to execute batches in priority order.
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 0/5] Enable OA unit for Gen 8 and 9 in i915 perf

2017-03-23 Thread Robert Bragg
Compared to the last Gen8+ OA series I've been investigating and debugging a
number of issues with the configuration of the Flexible EU counters whose state
is per-context:

* Removes assumption about the mmio registers having a contiguous range of
  addresses which wasn't true.
* Ensures that newly allocated contexts (while the OA unit is in use) will
  have their OA state properly initialized.
* Makes sure to only update render context state.

Instead of attempting to have a general purpose uncore api for dealing with the
details of allowing mmio writes to per-context registers this just adds a
constrained utility in i915_perf.c itself.

The per-gen initialization/configuration has been reworked to avoid a lot of
copy and paste boilerplate.

Addresses the codegen issue Matt noticed with the Broadwell ComputeExtended
metric set after checking with the authors of the XML files that the second
unconditional MUX config should affectively be appended to the end of all the
conditional configs.

(as with the last gen8+ series I sent, it's based on the various gen7 prep
patches that have been sent out separately)

These patches can be pulled from my wip/rib/oa-next branch here:

  https://github.com/rib/linux

In case anyone wants to take a look at the IGT tests so far they can be found
here:

  https://github.com/rib/intel-gpu-tools/commits/wip/rib/i915-perf-tests

Regards,
- Robert

Robert Bragg (5):
  drm/i915: expose _SLICE_MASK GETPARM
  drm/i915: expose _SUBSLICE_MASK GETPARM
  drm/i915: Add 'render basic' Gen8+ OA unit configs
  drm/i915: Add OA unit support for Gen 8+
  drm/i915: Add more OA configs for BDW, CHV, SKL + BXT

 drivers/gpu/drm/i915/Makefile   |8 +-
 drivers/gpu/drm/i915/i915_drv.c |   10 +
 drivers/gpu/drm/i915/i915_drv.h |   31 +-
 drivers/gpu/drm/i915/i915_gem_context.h |1 +
 drivers/gpu/drm/i915/i915_oa_bdw.c  | 5154 +++
 drivers/gpu/drm/i915/i915_oa_bdw.h  |   38 +
 drivers/gpu/drm/i915/i915_oa_bxt.c  | 2541 +++
 drivers/gpu/drm/i915/i915_oa_bxt.h  |   38 +
 drivers/gpu/drm/i915/i915_oa_chv.c  | 2730 
 drivers/gpu/drm/i915/i915_oa_chv.h  |   38 +
 drivers/gpu/drm/i915/i915_oa_hsw.c  |   58 +-
 drivers/gpu/drm/i915/i915_oa_sklgt2.c   | 3303 
 drivers/gpu/drm/i915/i915_oa_sklgt2.h   |   38 +
 drivers/gpu/drm/i915/i915_oa_sklgt3.c   | 2856 +
 drivers/gpu/drm/i915/i915_oa_sklgt3.h   |   38 +
 drivers/gpu/drm/i915/i915_oa_sklgt4.c   | 2910 +
 drivers/gpu/drm/i915/i915_oa_sklgt4.h   |   38 +
 drivers/gpu/drm/i915/i915_perf.c| 1034 ++-
 drivers/gpu/drm/i915/i915_reg.h |   22 +
 drivers/gpu/drm/i915/intel_lrc.c|5 +
 include/uapi/drm/i915_drm.h |   21 +-
 21 files changed, 20825 insertions(+), 87 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.h

-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2] drm/i915/perf: rate limit spurious oa report notice

2017-03-23 Thread Robert Bragg
This change is pre-emptively aiming to avoid a potential cause of kernel
logging noise in case some condition were to result in us seeing invalid
OA reports.

The workaround for the OA unit's tail pointer race condition is what
avoids the primary know cause of invalid reports being seen and with
that in place we aren't expecting to see this notice but it can't be
entirely ruled out.

Just in case some condition does lead to the notice then it's likely
that it will be triggered repeatedly while attempting to append a
sequence of reports and depending on the configured OA sampling
frequency that might be a large number of repeat notices.

v2: (Chris) avoid inconsistent warning on throttle with
printk_ratelimit()

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.h  |  6 ++
 drivers/gpu/drm/i915/i915_perf.c | 17 -
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a7b49cad6ab2..a7986c0c29ad 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2471,6 +2471,12 @@ struct drm_i915_private {
wait_queue_head_t poll_wq;
bool pollin;
 
+   /**
+* For rate limiting any notifications of spurious
+* invalid OA reports
+*/
+   struct ratelimit_state spurious_report_rs;
+
bool periodic;
int period_exponent;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index c09a7c9b61d9..36d07ca68029 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -632,7 +632,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
 * copying it to userspace...
 */
if (report32[0] == 0) {
-   DRM_NOTE("Skipping spurious, invalid OA report\n");
+   if (__ratelimit(_priv->perf.oa.spurious_report_rs))
+   DRM_NOTE("Skipping spurious, invalid OA 
report\n");
continue;
}
 
@@ -2144,6 +2145,15 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
if (!IS_HASWELL(dev_priv))
return;
 
+   /* Using the same limiting factors as printk_ratelimit() */
+   ratelimit_state_init(_priv->perf.oa.spurious_report_rs,
+   5 * HZ, 10);
+   /* We use a DRM_NOTE for spurious reports so it would be
+* inconsistent to print a warning for throttling.
+*/
+   ratelimit_set_flags(_priv->perf.oa.spurious_report_rs,
+   RATELIMIT_MSG_ON_RELEASE);
+
hrtimer_init(_priv->perf.oa.poll_check_timer,
 CLOCK_MONOTONIC, HRTIMER_MODE_REL);
dev_priv->perf.oa.poll_check_timer.function = oa_poll_check_timer_cb;
@@ -2182,6 +2192,11 @@ void i915_perf_fini(struct drm_i915_private *dev_priv)
if (!dev_priv->perf.initialized)
return;
 
+   if (dev_priv->perf.oa.spurious_report_rs.missed) {
+   DRM_NOTE("%d spurious OA report notices suppressed due to 
ratelimiting",
+dev_priv->perf.oa.spurious_report_rs.missed);
+   }
+
unregister_sysctl_table(dev_priv->perf.sysctl_header);
 
memset(_priv->perf.oa.ops, 0, sizeof(dev_priv->perf.oa.ops));
-- 
2.12.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/6] drm/i915: Add uncore mmio api for per-context registers

2017-03-23 Thread Robert Bragg
On Thu, Feb 23, 2017 at 3:35 PM, Chris Wilson <ch...@chris-wilson.co.uk> wrote:
> On Wed, Feb 22, 2017 at 04:36:31PM +0000, Robert Bragg wrote:
>> Since the exponent for periodic OA counter sampling is maintained in a
>> per-context register while we want to treat it as if it were global
>> state we need to be able to safely issue an mmio write to a per-context
>> register and affect any currently running context.
>>
>> We have to take some extra care in this case and this adds a utility
>> api to encapsulate what's required.
>
> Been waying up the pros/cons of having this in uncore. It doesn't
> attempt to be generic nor replace the existing instance, so atm I think
> it might be better as local to i915_perf.

Thanks. Can you maybe clarify for me what "the existing instance"
that's not replaced is in this context?

>
>> Signed-off-by: Robert Bragg <rob...@sixbynine.org>
>> ---
>>  drivers/gpu/drm/i915/i915_drv.h |  4 ++
>>  drivers/gpu/drm/i915/i915_reg.h |  3 ++
>>  drivers/gpu/drm/i915/intel_uncore.c | 73 
>> +
>>  3 files changed, 80 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 105b97bd34d7..c8d03a2e89cc 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -718,6 +718,7 @@ intel_uncore_forcewake_for_reg(struct drm_i915_private 
>> *dev_priv,
>>  i915_reg_t reg, unsigned int op);
>>
>>  struct intel_uncore_funcs {
>> + int (*wait_for_rcs_busy)(struct drm_i915_private *dev_priv);
>>   void (*force_wake_get)(struct drm_i915_private *dev_priv,
>>   enum forcewake_domains 
>> domains);
>>   void (*force_wake_put)(struct drm_i915_private *dev_priv,
>> @@ -751,6 +752,7 @@ struct intel_uncore {
>>
>>   struct intel_uncore_funcs funcs;
>>
>> + atomic_t hold_rcs_busy_count;
>>   unsigned fifo_count;
>>
>>   enum forcewake_domains fw_domains;
>> @@ -3139,6 +3141,8 @@ static inline bool intel_vgpu_active(struct 
>> drm_i915_private *dev_priv)
>>  {
>>   return dev_priv->vgpu.active;
>>  }
>> +int intel_uncore_begin_ctx_mmio(struct drm_i915_private *dev_priv);
>> +void intel_uncore_end_ctx_mmio(struct drm_i915_private *dev_priv);
>>
>>  void
>>  i915_enable_pipestat(struct drm_i915_private *dev_priv, enum pipe pipe,
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h 
>> b/drivers/gpu/drm/i915/i915_reg.h
>> index 141a5c1e3895..94d40e82edc1 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -2313,6 +2313,9 @@ enum skl_disp_power_wells {
>>  #define   GEN8_RC_SEMA_IDLE_MSG_DISABLE  (1 << 12)
>>  #define   GEN8_FF_DOP_CLOCK_GATE_DISABLE (1<<10)
>>
>> +#define GEN6_RCS_PWR_FSM _MMIO(0x22ac)
>> +#define GEN9_RCS_FE_FSM2 _MMIO(0x22a4)
>> +
>>  /* Fuse readout registers for GT */
>>  #define CHV_FUSE_GT  _MMIO(VLV_DISPLAY_BASE + 0x2168)
>>  #define   CHV_FGT_DISABLE_SS0(1 << 10)
>> diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
>> b/drivers/gpu/drm/i915/intel_uncore.c
>> index 441c51fd9746..06bfe5f89ac5 100644
>> --- a/drivers/gpu/drm/i915/intel_uncore.c
>> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> @@ -1274,6 +1274,16 @@ static void intel_uncore_fw_domains_init(struct 
>> drm_i915_private *dev_priv)
>>   WARN_ON(dev_priv->uncore.fw_domains == 0);
>>  }
>>
>> +static int gen8_wait_for_rcs_busy(struct drm_i915_private *dev_priv)
>> +{
>> + return wait_for((I915_READ(GEN6_RCS_PWR_FSM) & 0x3f) == 0x30, 50);
>> +}
>> +
>> +static int gen9_wait_for_rcs_busy(struct drm_i915_private *dev_priv)
>> +{
>> + return wait_for((I915_READ(GEN9_RCS_FE_FSM2) & 0x3f) == 0x30, 50);
>> +}
>> +
>>  #define ASSIGN_FW_DOMAINS_TABLE(d) \
>>  { \
>>   dev_priv->uncore.fw_domains_table = \
>> @@ -1305,6 +1315,8 @@ void intel_uncore_init(struct drm_i915_private 
>> *dev_priv)
>>   dev_priv->uncore.funcs.mmio_writel =
>>   gen9_decoupled_write32;
>>   }
>> + dev_priv->uncore.funcs.wait_for_rcs_busy =
>> + gen9_wait_for_rcs_busy;
>>   break;
>>   case 8:
>>   if (IS_

Re: [Intel-gfx] [PATCH 6/6] drm/i915: Add more OA configs for BDW, CHV, SKL + BXT

2017-03-20 Thread Robert Bragg
On Wed, Mar 1, 2017 at 1:00 PM, Matthew Auld
<matthew.william.a...@gmail.com> wrote:
> On 02/22, Robert Bragg wrote:
>> These are auto generated from an XML description of metric sets,
>> currently maintained in gputop, ref:
>>
>>  https://github.com/rib/gputop
>>  > gputop-data/oa-*.xml
>>  > scripts/i915-perf-kernelgen.py
>>
>>  $ make -C gputop-data -f Makefile.xml
>>
>> Signed-off-by: Robert Bragg <rob...@sixbynine.org>
>> ---
>
> 
>
>> +
>> +static const struct i915_oa_reg *
>> +get_compute_extended_mux_config(struct drm_i915_private *dev_priv,
>> + int *len)
>> +{
>> + if (INTEL_INFO(dev_priv)->sseu.subslice_mask & 0x01) {
>> + *len = 
>> ARRAY_SIZE(mux_config_compute_extended_1_0_subslices_0x01);
>> + return mux_config_compute_extended_1_0_subslices_0x01;
>> + } else if (INTEL_INFO(dev_priv)->sseu.subslice_mask & 0x08) {
>> + *len = 
>> ARRAY_SIZE(mux_config_compute_extended_1_1_subslices_0x08);
>> + return mux_config_compute_extended_1_1_subslices_0x08;
>> + } else if (INTEL_INFO(dev_priv)->sseu.subslice_mask & 0x02) {
>> + *len = 
>> ARRAY_SIZE(mux_config_compute_extended_1_2_subslices_0x02);
>> + return mux_config_compute_extended_1_2_subslices_0x02;
>> + } else if (INTEL_INFO(dev_priv)->sseu.subslice_mask & 0x10) {
>> + *len = 
>> ARRAY_SIZE(mux_config_compute_extended_1_3_subslices_0x10);
>> + return mux_config_compute_extended_1_3_subslices_0x10;
>> + } else if (INTEL_INFO(dev_priv)->sseu.subslice_mask & 0x04) {
>> + *len = 
>> ARRAY_SIZE(mux_config_compute_extended_1_4_subslices_0x04);
>> + return mux_config_compute_extended_1_4_subslices_0x04;
>> + } else if (INTEL_INFO(dev_priv)->sseu.subslice_mask & 0x20) {
>> + *len = 
>> ARRAY_SIZE(mux_config_compute_extended_1_5_subslices_0x20);
>> + return mux_config_compute_extended_1_5_subslices_0x20;
>> + *len = ARRAY_SIZE(mux_config_compute_extended);
>> + return mux_config_compute_extended;
> It looks like your script doesn't properly handle the unconditional mux
> config here.

Very well spotted!

This is a pretty ambiguous case and I'm not entirely sure what the
right thing to do here is and have asked the folks that came up with
the original OA config for their feedback. I *think* in this case it's
supposed to combine both as if the unconditional config were logically
concatenated at the end of all the conditional configs.

Locally I've at least updated the i915-perf-kernelgen.py script to
print a loud warning out for this so I could double check in case
there were other similar cases I'd missed - this BDW config seems to
be the only one with this ambiguity though. This makes me think that I
should update the mdapi-xml-convert.py script to just hide this
special case from our oa-*.xml file by explicitly concatenating with
the conditional configs if that's appropriate - then we don't need to
support multiple NOA configs in the i915 perf implementation. The
scripts can then have strict assertions that this doesn't change in
the future.

>
> 
>
>> +
>> +static const struct i915_oa_reg *
>> +get_test_oa_mux_config(struct drm_i915_private *dev_priv,
>> +int *len)
>> +{
>> + *len = ARRAY_SIZE(mux_config_test_oa);
>> + return mux_config_test_oa;
>> +}
>> +
>> +int i915_oa_select_metric_set_bdw(struct drm_i915_private *dev_priv)
>> +{
>> + dev_priv->perf.oa.mux_regs = NULL;
>> + dev_priv->perf.oa.mux_regs_len = 0;
>> + dev_priv->perf.oa.b_counter_regs = NULL;
>> + dev_priv->perf.oa.b_counter_regs_len = 0;
>> + dev_priv->perf.oa.flex_regs = NULL;
>> + dev_priv->perf.oa.flex_regs_len = 0;
>> +
>> + switch (dev_priv->perf.oa.metrics_set) {
>> + case METRIC_SET_ID_RENDER_BASIC:
>> + dev_priv->perf.oa.mux_regs =
>> + get_render_basic_mux_config(dev_priv,
>> + 
>> _priv->perf.oa.mux_regs_len);
>> + if (!dev_priv->perf.oa.mux_regs) {
>> + DRM_DEBUG_DRIVER("No suitable MUX config for 
>> \"RENDER_BASIC\" metric set");
> Your script also needs to output a newline for DRM_DEBUG_DRIVER.

Ah yep, fixed locally thanks,

>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/3] drm/i915/perf: rate limit spurious oa report notice

2017-03-20 Thread Robert Bragg
On Tue, Feb 28, 2017 at 1:33 PM, Chris Wilson <ch...@chris-wilson.co.uk> wrote:
> On Tue, Feb 28, 2017 at 01:28:13PM +, Matthew Auld wrote:
>> On 22 February 2017 at 15:25, Robert Bragg <rob...@sixbynine.org> wrote:
>> > This change is pre-emptively aiming to avoid a potential cause of kernel
>> > logging noise in case some condition were to result in us seeing invalid
>> > OA reports.
>> >
>> > The workaround for the OA unit's tail pointer race condition is what
>> > avoids the primary know cause of invalid reports being seen and with
>> > that in place we aren't expecting to see this notice but it can't be
>> > entirely ruled out.
>> >
>> > Just in case some condition does lead to the notice then it's likely
>> > that it will be triggered repeatedly while attempting to append a
>> > sequence of reports and depending on the configured OA sampling
>> > frequency that might be a large number of repeat notices.
>> >
>> > Signed-off-by: Robert Bragg <rob...@sixbynine.org>
>> Reviewed-by: Matthew Auld <matthew.a...@intel.com>
>
> printk_ratelimits emits a WARN when triggered, defeating the purpose of
> using NOTE.

Hmm, that's a slightly awkward problem.

The warning comes from the common __ratelimit utility api that's used
for numerous things but by default will warn if the rate limiting
kicked in (once printing resumes again).

There's a RATELIMIT_MSG_ON_RELEASE flag that can be set to def the
warning until ratelimit_state_exit() is called (which looks optional
or at least the flag could also be cleared before calling it to avoid
any warnings).

In general printk_ratelimit() doesn't seem like an ideal mechanism for
throttling messages, even if they are warnings, since the rate limit
state is shared across orthogonal components, so maybe it's best
anyway to use a custom rate limit state.

Considering the _MSG_ON_RELEASE flag, I think I'll init some ratelimit
state in dev_priv->perf.oa just for this message, use
ratelimit_set_flags() to set _MSG_ON_RELEASE and have a corresponding
debug/note in i915_perf_fini after checking the rs->missed counter.

Actually at this point I'm slightly doubting whether having a warning
for throttling is that terrible in this case. Although I tend to think
it could be good to have dedicated ratelimit state here, there
conceptually is a point a which a visible warning could be appropriate
if we're really seeing a lot of these spurious reports. It's a grey
area since it's ok as note if it's rare/intermittent, but somthing's
maybe gone wrong if we see *lots* of these. hmm.

Incidentally, I suppose this same observation is relevant to many of
the DRM_*_RATELIMITED macros too - there will be an inconsistent level
used to notify about any throttling?

Br,
- Robert

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/8] drm/i915: Expose OA sample source to userspace

2017-03-16 Thread Robert Bragg
On Thu, Mar 16, 2017 at 6:14 AM,  <sourab.gu...@intel.com> wrote:
> From: Sourab Gupta <sourab.gu...@intel.com>
>
> This patch exposes a new sample source field to userspace. This field can
> be populated to specify the origin of the OA report.
> Currently, the OA samples are being generated only periodically, and hence
> there's only source flag enum definition right now, but there are other
> means of generating OA samples, such as via MI_RPC commands. The OA_SOURCE
> sample type is introducing a mechanism (for userspace) to distinguish various
> OA reports generated via different sources.

Maybe we could clarify that it's not intended as a replacement for the
reason field that's part of Gen8+ OA reports. For automatically
triggered reports written to the OABUFFER then the reason field will
distinguish e.g. periodic vs ctx-switch vs GO transition reasons for
the OA unit writing a report. The reason field is overloaded as the
RPT_ID field for MI_RPC reports so we need our own way of tracking the
difference.

>
> Signed-off-by: Sourab Gupta <sourab.gu...@intel.com>
> Signed-off-by: Robert Bragg <rob...@sixbynine.org>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 18 ++
>  include/uapi/drm/i915_drm.h  | 14 ++
>  2 files changed, 32 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> b/drivers/gpu/drm/i915/i915_perf.c
> index 4b1db73..540c5b2 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -324,6 +324,7 @@
>  };
>
>  #define SAMPLE_OA_REPORT  (1<<0)
> +#define SAMPLE_OA_SOURCE_INFO  (1<<1)
>
>  /**
>   * struct perf_open_properties - for validated properties given to open a 
> stream
> @@ -659,6 +660,15 @@ static int append_oa_sample(struct i915_perf_stream 
> *stream,
> return -EFAULT;
> buf += sizeof(header);
>

I think I'd add a note mentioning that the ordering of _OA_SOURCE_INFO
before the _OA_REPORT forms part of the uapi - just as an extra guard
against someone reorganising code not realising that. Maybe over
paranoid though.

> +   if (sample_flags & SAMPLE_OA_SOURCE_INFO) {
> +   enum drm_i915_perf_oa_event_source source =
> +   I915_PERF_OA_EVENT_SOURCE_PERIODIC;
> +
> +   if (copy_to_user(buf, , 4))
> +   return -EFAULT;
> +   buf += 4;
> +   }

I think we should keep all sample sections 8 byte aligned to keep any
embedded u64 values we might want within records naturally aligned.
Currently we know OA reports all have power-of-two sizes of 64, 128 or
256 bytes. The _OA_REPORT_LOST and _OA_BUFFER_LOST records only have
an 8 byte header.

'PERIODIC' may not be an accurate differentiator. As mentioned above
with the comment about then Gen8+ 'reason' field there are a number of
different automatic triggers for writing OA reports to the OABUFFER.
Maybe 'OABUFFER' would be better.

I'm not sure the final '_INFO' suffix is necessary here. To me I think
I'd expect richer, more-detailed data with that suffix and I'd expect
the data to about the source while this is just about identifying the
source. (not too important at this stage - just mentioning how it
comes across when I read it currently).

> +
> if (sample_flags & SAMPLE_OA_REPORT) {
> if (copy_to_user(buf, report, report_size))
> return -EFAULT;
> @@ -2030,6 +2040,11 @@ static int i915_oa_stream_init(struct i915_perf_stream 
> *stream,
> stream->sample_flags |= SAMPLE_OA_REPORT;
> stream->sample_size += format_size;
>
> +   if (props->sample_flags & SAMPLE_OA_SOURCE_INFO) {
> +   stream->sample_flags |= SAMPLE_OA_SOURCE_INFO;
> +   stream->sample_size += 4;
> +   }

8 - as commented above.

> +
> dev_priv->perf.oa.oa_buffer.format_size = format_size;
> if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
> return -EINVAL;
> @@ -2814,6 +2829,9 @@ static int read_properties_unlocked(struct 
> drm_i915_private *dev_priv,
> props->oa_periodic = true;
> props->oa_period_exponent = value;
> break;
> +   case DRM_I915_PERF_PROP_SAMPLE_OA_SOURCE:
> +   props->sample_flags |= SAMPLE_OA_SOURCE_INFO;
> +   break;
> default:
> MISSING_CASE(id);
> DRM_DEBUG("Unknown i915 perf property ID\n");
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 835e711..c597e36 100644
> --- a/include/uapi/drm/i915_drm

Re: [Intel-gfx] [PATCH 0/8] Collect command stream based OA reports using i915 perf

2017-03-16 Thread Robert Bragg
On Thu, Mar 16, 2017 at 6:14 AM,   wrote:
> From: Sourab Gupta 
>
> This series adds framework for collection of OA reports associated with the
> render command stream, which are collected around batchbuffer boundaries.
>
> Refloating the series rebased on Robert's latest patch set for
> 'Enabling OA unit for Gen 8 and 9 in i915 perf', which can be found here:
> https://patchwork.freedesktop.org/series/20084/
>
> Since Robert's patches are being reviewed and this patch series extends his
> framework to collect command stream based OA metrics, it would be good to keep
> this work in perspective. Looking to receive feedback (and possibly r-b's :))
> on the series.
>
> Since the OA reports collected associated with the render command stream, this
> also gives us the ability to collect other metadata such as ctx_id, pid, etc.
> with the samples, and thus we can establish the association of samples
> collected with the corresponding process/workload.
>
> These patches can be found for viewing at
> https://github.com/sourabgu/linux/tree/oa-6march2017
>
> Sourab Gupta (8):
>   drm/i915: Add ctx getparam ioctl parameter to retrieve ctx unique id
>   drm/i915: Expose OA sample source to userspace
>   drm/i915: Framework for capturing command stream based OA reports
>   drm/i915: flush periodic samples, in case of no pending CS sample
> requests
>   drm/i915: Inform userspace about command stream OA buf overflow
>   drm/i915: Populate ctx ID for periodic OA reports
>   drm/i915: Add support for having pid output with OA report
>   drm/i915: Add support for emitting execbuffer tags through OA counter
> reports

Thanks for the updated series Sourab.

I think it could really help to have a pointer to some userspace that
can be used to exercise these new features. Maybe you could look at
adding support to the gputop-csv command line tool which is probably
the simplest, usable userspace for i915 perf we have currently. A
pointer to some work-in-progress IGT tests could be good too.

Br,
- Robert

>
>  drivers/gpu/drm/i915/i915_drv.h|  125 ++-
>  drivers/gpu/drm/i915/i915_gem_context.c|3 +
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |6 +
>  drivers/gpu/drm/i915/i915_perf.c   | 1149 
> 
>  include/uapi/drm/i915_drm.h|   49 ++
>  5 files changed, 1184 insertions(+), 148 deletions(-)
>
> --
> 1.9.1
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 2/6] drm/i915: expose _SUBSLICE_MASK GETPARM

2017-02-22 Thread Robert Bragg
Assuming a uniform mask across all slices, this enables userspace to
determine the specific sub slices enabled. This information is required,
for example, to be able to analyse some OA counter reports where the
counter configuration depends on the HW sub slice configuration.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.c | 5 +
 include/uapi/drm/i915_drm.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index d391768f301b..a497537b42d3 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -364,6 +364,11 @@ static int i915_getparam(struct drm_device *dev, void 
*data,
if (!value)
return -ENODEV;
break;
+   case I915_PARAM_SUBSLICE_MASK:
+   value = INTEL_INFO(dev_priv)->sseu.subslice_mask;
+   if (!value)
+   return -ENODEV;
+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index f47fb7f26f36..e0599e729e68 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -393,6 +393,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_MIN_EU_IN_POOL   39
 #define I915_PARAM_MMAP_GTT_VERSION 40
 #define I915_PARAM_SLICE_MASK   45 /* XXX: rebase before landing */
+#define I915_PARAM_SUBSLICE_MASK46
 
 /* Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution
  * priorities and the driver will attempt to execute batches in priority order.
-- 
2.11.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 0/6] Enable OA unit for Gen 8 and 9 in i915 perf

2017-02-22 Thread Robert Bragg
This extends i915 perf to support periodic sampling of OA metrics for BDW, CHV,
SKL and BXT.

I've recently been working through a number of issues that were uncovered once
I started adapting the gen7 IGT tests. Some further issues were also noticed by
others using INTEL_performance_query in Mesa.  So compared to earlier Gen8+ OA
series sent out a few notable updates are:

* Implements tail race handling like we have for gen 7
* On SKL we disable automatic reports for slice/unslice clock ratio changes
  because the hardware can generate so many redundant reports (no ratio change)
  that buffer overflows can start to become a problem and userspace ends up
  doing an unreasonable amount of work to keep up.
* WaDisableDopClockGating:bdw is no longer handled in i915_perf.c
* Flex EU register updates within the register state context were using
  incorrect offsets before.
* The generated per-gen config initialization code wasn't initializing the
  flex eu register state.
* While filtering for a single context we weren't forwarding userspace the
  reports it needed to bookend around context switches so that it can discount
  the progress of the counters associated with other contexts.
* Even opening a stream for single context filtered OA metrics requires
  dev.i915.perf_stream_paranoid == 0 since we can no longer reliably hide the
  progress of system-wide metrics, now that filtering is done based on tagging
  reports with an ID instead of physically clock gating the OA unit off for
  other contests.

In case anyone wants to take a look at the IGT tests so far they can be found
here:

  https://github.com/rib/intel-gpu-tools/commits/wip/rib/i915-perf-tests

I should probably write a few additional tests specific to changes made for
gen8+, but I've at least adapted the existing tests for gen7.

I can only say I've tested on Skylake recently, so although we can hopefully
review these patches with an eye towards upstreaming, they do still need some
broader testing across more hardware.

Considering the large number of configs for Gen 8 and 9, and how unsure I am
about the maturity of all the various configs I do have some doubt about
whether we should upstream them all together if someone hasn't looked at the
unique counters for each set against a relevant workload to see that the
numbers look plausible (so ignoring the common A counters). On the other hand
being optimistic means we can make more sets available faster and even makes it
more likely they will be tested across a broader range of workloads.  We can
add replacement configs if needs be or even stop advertising configs
considering how they are advertised to userspace dynamically.

Note: these patches are based on the patches I sent out to update the tail
pointer race workaround for Haswell in i915 perf.

Regards,
- Robert

Robert Bragg (6):
  drm/i915: expose _SLICE_MASK GETPARM
  drm/i915: expose _SUBSLICE_MASK GETPARM
  drm/i915: Add uncore mmio api for per-context registers
  drm/i915: Add 'render basic' Gen8+ OA unit configs
  drm/i915: Add OA unit support for Gen 8+
  drm/i915: Add more OA configs for BDW, CHV, SKL + BXT

 drivers/gpu/drm/i915/Makefile   |8 +-
 drivers/gpu/drm/i915/i915_drv.c |   10 +
 drivers/gpu/drm/i915/i915_drv.h |   20 +
 drivers/gpu/drm/i915/i915_gem_context.h |1 +
 drivers/gpu/drm/i915/i915_oa_bdw.c  | 4954 +++
 drivers/gpu/drm/i915/i915_oa_bdw.h  |   38 +
 drivers/gpu/drm/i915/i915_oa_bxt.c  | 2541 
 drivers/gpu/drm/i915/i915_oa_bxt.h  |   38 +
 drivers/gpu/drm/i915/i915_oa_chv.c  | 2730 +
 drivers/gpu/drm/i915/i915_oa_chv.h  |   38 +
 drivers/gpu/drm/i915/i915_oa_sklgt2.c   | 3303 +
 drivers/gpu/drm/i915/i915_oa_sklgt2.h   |   38 +
 drivers/gpu/drm/i915/i915_oa_sklgt3.c   | 2856 ++
 drivers/gpu/drm/i915/i915_oa_sklgt3.h   |   38 +
 drivers/gpu/drm/i915/i915_oa_sklgt4.c   | 2910 ++
 drivers/gpu/drm/i915/i915_oa_sklgt4.h   |   38 +
 drivers/gpu/drm/i915/i915_perf.c| 1067 ++-
 drivers/gpu/drm/i915/i915_reg.h |   12 +
 drivers/gpu/drm/i915/intel_lrc.c|4 +
 drivers/gpu/drm/i915/intel_uncore.c |   73 +
 include/uapi/drm/i915_drm.h |   21 +-
 21 files changed, 20667 insertions(+), 71 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.c
 create mode

[Intel-gfx] [PATCH 1/6] drm/i915: expose _SLICE_MASK GETPARM

2017-02-22 Thread Robert Bragg
Enables userspace to determine the number of slices enabled and also
know what specific slices are enabled. This information is required, for
example, to be able to analyse some OA counter reports where the counter
configuration depends on the HW slice configuration.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.c | 5 +
 include/uapi/drm/i915_drm.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index b76e8f7ac174..d391768f301b 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -359,6 +359,11 @@ static int i915_getparam(struct drm_device *dev, void 
*data,
 */
value = 1;
break;
+   case I915_PARAM_SLICE_MASK:
+   value = INTEL_INFO(dev_priv)->sseu.slice_mask;
+   if (!value)
+   return -ENODEV;
+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 3554495bef13..f47fb7f26f36 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -392,6 +392,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_POOLED_EU38
 #define I915_PARAM_MIN_EU_IN_POOL   39
 #define I915_PARAM_MMAP_GTT_VERSION 40
+#define I915_PARAM_SLICE_MASK   45 /* XXX: rebase before landing */
 
 /* Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution
  * priorities and the driver will attempt to execute batches in priority order.
-- 
2.11.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 3/6] drm/i915: Add uncore mmio api for per-context registers

2017-02-22 Thread Robert Bragg
Since the exponent for periodic OA counter sampling is maintained in a
per-context register while we want to treat it as if it were global
state we need to be able to safely issue an mmio write to a per-context
register and affect any currently running context.

We have to take some extra care in this case and this adds a utility
api to encapsulate what's required.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.h |  4 ++
 drivers/gpu/drm/i915/i915_reg.h |  3 ++
 drivers/gpu/drm/i915/intel_uncore.c | 73 +
 3 files changed, 80 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 105b97bd34d7..c8d03a2e89cc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -718,6 +718,7 @@ intel_uncore_forcewake_for_reg(struct drm_i915_private 
*dev_priv,
   i915_reg_t reg, unsigned int op);
 
 struct intel_uncore_funcs {
+   int (*wait_for_rcs_busy)(struct drm_i915_private *dev_priv);
void (*force_wake_get)(struct drm_i915_private *dev_priv,
enum forcewake_domains 
domains);
void (*force_wake_put)(struct drm_i915_private *dev_priv,
@@ -751,6 +752,7 @@ struct intel_uncore {
 
struct intel_uncore_funcs funcs;
 
+   atomic_t hold_rcs_busy_count;
unsigned fifo_count;
 
enum forcewake_domains fw_domains;
@@ -3139,6 +3141,8 @@ static inline bool intel_vgpu_active(struct 
drm_i915_private *dev_priv)
 {
return dev_priv->vgpu.active;
 }
+int intel_uncore_begin_ctx_mmio(struct drm_i915_private *dev_priv);
+void intel_uncore_end_ctx_mmio(struct drm_i915_private *dev_priv);
 
 void
 i915_enable_pipestat(struct drm_i915_private *dev_priv, enum pipe pipe,
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 141a5c1e3895..94d40e82edc1 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -2313,6 +2313,9 @@ enum skl_disp_power_wells {
 #define   GEN8_RC_SEMA_IDLE_MSG_DISABLE(1 << 12)
 #define   GEN8_FF_DOP_CLOCK_GATE_DISABLE   (1<<10)
 
+#define GEN6_RCS_PWR_FSM _MMIO(0x22ac)
+#define GEN9_RCS_FE_FSM2 _MMIO(0x22a4)
+
 /* Fuse readout registers for GT */
 #define CHV_FUSE_GT_MMIO(VLV_DISPLAY_BASE + 0x2168)
 #define   CHV_FGT_DISABLE_SS0  (1 << 10)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
b/drivers/gpu/drm/i915/intel_uncore.c
index 441c51fd9746..06bfe5f89ac5 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1274,6 +1274,16 @@ static void intel_uncore_fw_domains_init(struct 
drm_i915_private *dev_priv)
WARN_ON(dev_priv->uncore.fw_domains == 0);
 }
 
+static int gen8_wait_for_rcs_busy(struct drm_i915_private *dev_priv)
+{
+   return wait_for((I915_READ(GEN6_RCS_PWR_FSM) & 0x3f) == 0x30, 50);
+}
+
+static int gen9_wait_for_rcs_busy(struct drm_i915_private *dev_priv)
+{
+   return wait_for((I915_READ(GEN9_RCS_FE_FSM2) & 0x3f) == 0x30, 50);
+}
+
 #define ASSIGN_FW_DOMAINS_TABLE(d) \
 { \
dev_priv->uncore.fw_domains_table = \
@@ -1305,6 +1315,8 @@ void intel_uncore_init(struct drm_i915_private *dev_priv)
dev_priv->uncore.funcs.mmio_writel =
gen9_decoupled_write32;
}
+   dev_priv->uncore.funcs.wait_for_rcs_busy =
+   gen9_wait_for_rcs_busy;
break;
case 8:
if (IS_CHERRYVIEW(dev_priv)) {
@@ -1316,6 +1328,8 @@ void intel_uncore_init(struct drm_i915_private *dev_priv)
ASSIGN_WRITE_MMIO_VFUNCS(gen8);
ASSIGN_READ_MMIO_VFUNCS(gen6);
}
+   dev_priv->uncore.funcs.wait_for_rcs_busy =
+   gen8_wait_for_rcs_busy;
break;
case 7:
case 6:
@@ -1858,6 +1872,65 @@ intel_uncore_forcewake_for_reg(struct drm_i915_private 
*dev_priv,
return fw_domains;
 }
 
+static int hold_rcs_busy(struct drm_i915_private *dev_priv)
+{
+   int ret = 0;
+
+   if (atomic_inc_and_test(_priv->uncore.hold_rcs_busy_count)) {
+   I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL,
+  _MASKED_BIT_ENABLE(GEN6_PSMI_SLEEP_MSG_DISABLE));
+
+   ret = dev_priv->uncore.funcs.wait_for_rcs_busy(dev_priv);
+   }
+
+   return ret;
+}
+
+static void release_rcs_busy(struct drm_i915_private *dev_priv)
+{
+   if (!atomic_dec_and_test(_priv->uncore.hold_rcs_busy_count)) {
+   I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL,
+  _MASKED_BIT_DISABLE(GEN6_PSMI_SLEEP_MSG_DISABLE));
+   }
+}
+
+/*
+ * From Broadwell PRM, 3D-Media-GPGPU -> Register Stat

[Intel-gfx] [PATCH 4/6] drm/i915: Add 'render basic' Gen8+ OA unit configs

2017-02-22 Thread Robert Bragg
Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic
render metrics on Broadwell, Cherryview, Skylake and Broxton. These are
auto generated from an XML description of metric sets, currently
maintained in gputop, ref:

 https://github.com/rib/gputop
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml WHITELIST=RenderBasic

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/Makefile |   8 +-
 drivers/gpu/drm/i915/i915_drv.h   |   2 +
 drivers/gpu/drm/i915/i915_oa_bdw.c| 371 ++
 drivers/gpu/drm/i915/i915_oa_bdw.h|  38 
 drivers/gpu/drm/i915/i915_oa_bxt.c| 322 +
 drivers/gpu/drm/i915/i915_oa_bxt.h|  38 
 drivers/gpu/drm/i915/i915_oa_chv.c| 214 
 drivers/gpu/drm/i915/i915_oa_chv.h|  38 
 drivers/gpu/drm/i915/i915_oa_sklgt2.c | 228 +
 drivers/gpu/drm/i915/i915_oa_sklgt2.h |  38 
 drivers/gpu/drm/i915/i915_oa_sklgt3.c | 236 +
 drivers/gpu/drm/i915/i915_oa_sklgt3.h |  38 
 drivers/gpu/drm/i915/i915_oa_sklgt4.c | 247 ++
 drivers/gpu/drm/i915/i915_oa_sklgt4.h |  38 
 14 files changed, 1855 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bdw.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_bxt.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_chv.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt2.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt3.h
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_sklgt4.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index b1b580337c7a..65a31450a392 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -127,7 +127,13 @@ i915-y += i915_vgpu.o
 
 # perf code
 i915-y += i915_perf.o \
- i915_oa_hsw.o
+ i915_oa_hsw.o \
+ i915_oa_bdw.o \
+ i915_oa_chv.o \
+ i915_oa_sklgt2.o \
+ i915_oa_sklgt3.o \
+ i915_oa_sklgt4.o \
+ i915_oa_bxt.o
 
 ifeq ($(CONFIG_DRM_I915_GVT),y)
 i915-y += intel_gvt.o
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c8d03a2e89cc..d5e084b46023 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2437,6 +2437,8 @@ struct drm_i915_private {
int mux_regs_len;
const struct i915_oa_reg *b_counter_regs;
int b_counter_regs_len;
+   const struct i915_oa_reg *flex_regs;
+   int flex_regs_len;
 
struct {
struct i915_vma *vma;
diff --git a/drivers/gpu/drm/i915/i915_oa_bdw.c 
b/drivers/gpu/drm/i915/i915_oa_bdw.c
new file mode 100644
index ..d5d608ff2a36
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_bdw.c
@@ -0,0 +1,371 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include 
+
+#include "i915_drv.h"
+#include "i915_oa_bdw.h"
+
+enum metric_set_id {
+   METRIC_SET_ID_RENDER_BASIC = 1,
+};
+
+int i915_oa_n_builtin_metric_sets_bdw = 1;
+
+static const struct i915_oa_reg b_counter_config_render_basic[] = {
+   { _MMIO(0x2710), 0x },
+   { _MMIO(0x2714), 0x0080 },
+   { _MMIO(0x2720), 0x },
+ 

[Intel-gfx] [PATCH 5/6] drm/i915: Add OA unit support for Gen 8+

2017-02-22 Thread Robert Bragg
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.

Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.

The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).

This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.

The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).

Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.h |   14 +
 drivers/gpu/drm/i915/i915_gem_context.h |1 +
 drivers/gpu/drm/i915/i915_perf.c| 1067 +--
 drivers/gpu/drm/i915/i915_reg.h |9 +
 drivers/gpu/drm/i915/intel_lrc.c|4 +
 include/uapi/drm/i915_drm.h |   19 +-
 6 files changed, 1044 insertions(+), 70 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d5e084b46023..b6d617e308af 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2443,6 +2443,7 @@ struct drm_i915_private {
struct {
struct i915_vma *vma;
u8 *vaddr;
+   u32 last_ctx_id;
int format;
int format_size;
 
@@ -2512,6 +2513,14 @@ struct drm_i915_private {
} oa_buffer;
 
u32 gen7_latched_oastatus1;
+   u32 ctx_oactxctrl_off;
+   u32 ctx_flexeu0_off;
+
+   /* The RPT_ID/reason field for Gen8+ includes a bit
+* to determine if the CTX ID in the report is valid
+* but the specific bit differs between Gen 8 and 9
+*/
+   u32 gen8_valid_ctx_bit;
 
struct i915_oa_ops ops;
const struct i915_oa_format *oa_formats;
@@ -2823,6 +2832,8 @@ intel_info(const struct drm_i915_private *dev_priv)
 #define IS_KBL_ULX(dev_priv)   (INTEL_DEVID(dev_priv) == 0x590E || \
 INTEL_DEVID(dev_priv) == 0x5915 || \
 INTEL_DEVID(dev_priv) == 0x591E)
+#define IS_SKL_GT2(dev_priv)   (IS_SKYLAKE(dev_priv) && \
+(INTEL_DEVID(dev_priv) & 0x00F0) == 0x0010)
 #define IS_SKL_GT3(dev_priv)   (IS_SKYLAKE(dev_priv) && \
 (INTEL_DEVID(dev_priv) & 0x00F0) == 0x0020)
 #define IS_SKL_GT4(dev_priv)   (IS_SKYLAKE(dev_priv) && \
@@ -3566,6 +3577,9 @@ i915_gem_context_lookup_timeline(struct i915_gem_context 
*ctx,
 
 int i915_perf_open_ioctl(struct drm_device *dev, void *data,

[Intel-gfx] [PATCH 2/3] drm/i915/perf: better pipeline aged/aging tail updates

2017-02-22 Thread Robert Bragg
This updates the tail pointer race workaround handling to updating the
'aged' pointer before looking to start aging a new one. There's the
possibility that there is already new data available and so we can
immediately start aging a new pointer without having to first wait for a
later hrtimer callback (and then another to age).

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_perf.c | 41 ++--
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 19d0e4222974..d04ebaa8406e 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -391,6 +391,29 @@ static bool gen7_oa_buffer_check_unlocked(struct 
drm_i915_private *dev_priv)
 
now = ktime_get_mono_fast_ns();
 
+   /* Update the aged tail
+*
+* Flip the tail pointer available for read()s once the aging tail is
+* old enough to trust that the corresponding data will be visible to
+* the CPU...
+*
+* Do this before updating the aging pointer in case we may be able to
+* immediately start aging a new pointer too (if new data has become
+* available) without needing to wait for a later hrtimer callback.
+*/
+   if (aging_tail != INVALID_TAIL_PTR &&
+   ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) >
+OA_TAIL_MARGIN_NSEC)) {
+   aged_idx ^= 1;
+   dev_priv->perf.oa.oa_buffer.aged_tail_idx = aged_idx;
+
+   aged_tail = aging_tail;
+
+   /* Mark that we need a new pointer to start aging... */
+   dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = 
INVALID_TAIL_PTR;
+   aging_tail = INVALID_TAIL_PTR;
+   }
+
/* Update the aging tail
 *
 * We throttle aging tail updates until we have a new tail that
@@ -420,24 +443,6 @@ static bool gen7_oa_buffer_check_unlocked(struct 
drm_i915_private *dev_priv)
}
}
 
-   /* Update the aged tail
-*
-* Flip the tail pointer available for read()s once the aging tail is
-* old enough to trust that the corresponding data will be visible to
-* the CPU...
-*/
-   if (aging_tail != INVALID_TAIL_PTR &&
-   ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) >
-OA_TAIL_MARGIN_NSEC)) {
-   aged_idx ^= 1;
-   dev_priv->perf.oa.oa_buffer.aged_tail_idx = aged_idx;
-
-   aged_tail = aging_tail;
-
-   /* Mark that we need a new pointer to start aging... */
-   dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = 
INVALID_TAIL_PTR;
-   }
-
spin_unlock_irqrestore(_priv->perf.oa.oa_buffer.ptr_lock, flags);
 
return aged_tail == INVALID_TAIL_PTR ?
-- 
2.11.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/3] drm/i915/perf: improve invalid OA format debug message

2017-02-22 Thread Robert Bragg
A minor improvement to debugging output

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_perf.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 383b5769a851..19d0e4222974 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1898,11 +1898,13 @@ static int read_properties_unlocked(struct 
drm_i915_private *dev_priv,
break;
case DRM_I915_PERF_PROP_OA_FORMAT:
if (value == 0 || value >= I915_OA_FORMAT_MAX) {
-   DRM_DEBUG("Invalid OA report format\n");
+   DRM_DEBUG("Out-of-range OA report format 
%llu\n",
+ value);
return -EINVAL;
}
if (!dev_priv->perf.oa.oa_formats[value].size) {
-   DRM_DEBUG("Invalid OA report format\n");
+   DRM_DEBUG("Unsupported OA report format %llu\n",
+ value);
return -EINVAL;
}
props->oa_format = value;
-- 
2.11.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 3/3] drm/i915/perf: rate limit spurious oa report notice

2017-02-22 Thread Robert Bragg
This change is pre-emptively aiming to avoid a potential cause of kernel
logging noise in case some condition were to result in us seeing invalid
OA reports.

The workaround for the OA unit's tail pointer race condition is what
avoids the primary know cause of invalid reports being seen and with
that in place we aren't expecting to see this notice but it can't be
entirely ruled out.

Just in case some condition does lead to the notice then it's likely
that it will be triggered repeatedly while attempting to append a
sequence of reports and depending on the configured OA sampling
frequency that might be a large number of repeat notices.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_perf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index d04ebaa8406e..a901bcd80263 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -632,7 +632,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
 * copying it to userspace...
 */
if (report32[0] == 0) {
-   DRM_NOTE("Skipping spurious, invalid OA report\n");
+   if (printk_ratelimit())
+   DRM_NOTE("Skipping spurious, invalid OA 
report\n");
continue;
}
 
-- 
2.11.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 0/3] Some minor i915-perf prep changes

2017-02-22 Thread Robert Bragg
A small set of i915 perf changes that could ideally land before the gen8+
patches I hope to send out soon.

These are based on top of the gen7 tail pointer race workaround changes that
were sent out recently.

Robert Bragg (3):
  drm/i915/perf: improve invalid OA format debug message
  drm/i915/perf: better pipeline aged/aging tail updates
  drm/i915/perf: rate limit spurious oa report notice

 drivers/gpu/drm/i915/i915_perf.c | 50 +++-
 1 file changed, 29 insertions(+), 21 deletions(-)

-- 
2.11.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2] drm/i915: fix for WaDisableDopClockGating:bdw

2017-02-13 Thread Robert Bragg
On Mon, Feb 13, 2017 at 2:28 PM, Ville Syrjälä
<ville.syrj...@linux.intel.com> wrote:
> On Sun, Feb 12, 2017 at 01:32:52PM +0000, Robert Bragg wrote:
>> This workaround for BDW was incomplete as it also requires EUTC clock
>> gating to be disabled via UCGCTL1.
>>
>> v2: read modify write UCGTL1 in broadwell_init_clock_gating (Ville)
>>
>> Signed-off-by: Robert Bragg <rob...@sixbynine.org>
>> Cc: Ville Syrjälä <ville.syrj...@linux.intel.com>
>
> Reviewed-by: Ville Syrjälä <ville.syrj...@linux.intel.com>
>
> Do we know if this fixes something real? And if so, do we want cc:stable?

I looked at this as I was reviewing what workarounds I need to deal
with for gen8+ OA unit enabling. I had previously been applying this
WA within i915_perf.c with a long standing comment that I should cross
reference with VPG to see which platforms really needed it. So I just
noticed that i915 is now aiming to handle the WA but it looked
incomplete. I can't say I've observed the issue myself but my
understanding is that it may affect OA metrics in some cases.

Br,
- Robert

>
>> ---
>>  drivers/gpu/drm/i915/intel_pm.c | 8 
>>  drivers/gpu/drm/i915/intel_ringbuffer.c | 6 +-
>>  2 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_pm.c 
>> b/drivers/gpu/drm/i915/intel_pm.c
>> index c0b0f5a4b9f1..3c13be8985f1 100644
>> --- a/drivers/gpu/drm/i915/intel_pm.c
>> +++ b/drivers/gpu/drm/i915/intel_pm.c
>> @@ -7229,6 +7229,14 @@ static void broadwell_init_clock_gating(struct 
>> drm_i915_private *dev_priv)
>>  | KVM_CONFIG_CHANGE_NOTIFICATION_SELECT);
>>
>>   lpt_init_clock_gating(dev_priv);
>> +
>> + /* WaDisableDopClockGating:bdw
>> +  *
>> +  * Also see the CHICKEN2 write in bdw_init_workarounds() to disable DOP
>> +  * clock gating.
>> +  */
>> + I915_WRITE(GEN6_UCGCTL1,
>> +I915_READ(GEN6_UCGCTL1) | 
>> GEN6_EU_TCUNIT_CLOCK_GATE_DISABLE);
>>  }
>>
>>  static void haswell_init_clock_gating(struct drm_i915_private *dev_priv)
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index d3d1e64f2498..d93d5f8f02d8 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -825,7 +825,11 @@ static int bdw_init_workarounds(struct intel_engine_cs 
>> *engine)
>>   /* WaDisableThreadStallDopClockGating:bdw (pre-production) */
>>   WA_SET_BIT_MASKED(GEN8_ROW_CHICKEN, STALL_DOP_GATING_DISABLE);
>>
>> - /* WaDisableDopClockGating:bdw */
>> + /* WaDisableDopClockGating:bdw
>> +  *
>> +  * Also see the related UCGTCL1 write in broadwell_init_clock_gating()
>> +  * to disable EUTC clock gating.
>> +  */
>>   WA_SET_BIT_MASKED(GEN7_ROW_CHICKEN2,
>> DOP_CLOCK_GATING_DISABLE);
>>
>> --
>> 2.11.1
>
> --
> Ville Syrjälä
> Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 3/5] drm/i915/perf: avoid read back of head register

2017-02-13 Thread Robert Bragg
There's no need for the driver to keep reading back the head pointer
from hardware since the hardware doesn't update it automatically. This
way we can treat any invalid head pointer value as a software/driver
bug instead of spurious hardware behaviour.

This change is also a small stepping stone towards re-working how
the head and tail state is managed as part of an improved workaround
for the tail register race condition.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  | 11 ++
 drivers/gpu/drm/i915/i915_perf.c | 46 ++--
 2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b5f150b0870d..b990549cfe65 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2428,6 +2428,17 @@ struct drm_i915_private {
u8 *vaddr;
int format;
int format_size;
+
+   /**
+* Although we can always read back the head
+* pointer register, we prefer to avoid
+* trusting the HW state, just to avoid any
+* risk that some hardware condition could
+* somehow bump the head pointer unpredictably
+* and cause us to forward the wrong OA buffer
+* data to userspace.
+*/
+   u32 head;
} oa_buffer;
 
u32 gen7_latched_oastatus1;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 4bb7333dac45..e85583d0bcff 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -322,9 +322,8 @@ struct perf_open_properties {
 static bool gen7_oa_buffer_is_empty_fop_unlocked(struct drm_i915_private 
*dev_priv)
 {
int report_size = dev_priv->perf.oa.oa_buffer.format_size;
-   u32 oastatus2 = I915_READ(GEN7_OASTATUS2);
u32 oastatus1 = I915_READ(GEN7_OASTATUS1);
-   u32 head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK;
+   u32 head = dev_priv->perf.oa.oa_buffer.head;
u32 tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
 
return OA_TAKEN(tail, head) <
@@ -458,16 +457,24 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
return -EIO;
 
head = *head_ptr - gtt_offset;
+
+   /* An out of bounds or misaligned head pointer implies a driver bug
+* since we are in full control of head pointer which should only
+* be incremented by multiples of the report size (notably also
+* all a power of two).
+*/
+   if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size,
+ "Inconsistent OA buffer head pointer = %u\n", head))
+   return -EIO;
+
tail -= gtt_offset;
 
/* The OA unit is expected to wrap the tail pointer according to the OA
-* buffer size and since we should never write a misaligned head
-* pointer we don't expect to read one back either...
+* buffer size
 */
-   if (tail > OA_BUFFER_SIZE || head > OA_BUFFER_SIZE ||
-   head % report_size) {
-   DRM_ERROR("Inconsistent OA buffer pointer (head = %u, tail = 
%u): force restart\n",
- head, tail);
+   if (tail > OA_BUFFER_SIZE) {
+   DRM_ERROR("Inconsistent OA buffer tail pointer = %u: force 
restart\n",
+ tail);
dev_priv->perf.oa.ops.oa_disable(dev_priv);
dev_priv->perf.oa.ops.oa_enable(dev_priv);
*head_ptr = I915_READ(GEN7_OASTATUS2) &
@@ -562,8 +569,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
size_t *offset)
 {
struct drm_i915_private *dev_priv = stream->dev_priv;
-   int report_size = dev_priv->perf.oa.oa_buffer.format_size;
-   u32 oastatus2;
u32 oastatus1;
u32 head;
u32 tail;
@@ -572,10 +577,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
if (WARN_ON(!dev_priv->perf.oa.oa_buffer.vaddr))
return -EIO;
 
-   oastatus2 = I915_READ(GEN7_OASTATUS2);
oastatus1 = I915_READ(GEN7_OASTATUS1);
 
-   head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK;
+   head = dev_priv->perf.oa.oa_buffer.head;
tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
 
/* XXX: On Haswell we don't have a safe way to clear oastatus1
@@ -616,10 +620,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
   

[Intel-gfx] [PATCH v2 0/5] drm/i915/perf: Improve handling of OA tail race

2017-02-13 Thread Robert Bragg
Folds in Matthew Auld's feedback and adds his RBs.

Robert Bragg (5):
  drm/i915/perf: fix gen7_append_oa_reports comment
  drm/i915/perf: avoid poll, read, EAGAIN busy loops
  drm/i915/perf: avoid read back of head register
  drm/i915/perf: no head/tail ref in gen7_oa_read
  drm/i915/perf: improve tail race workaround

 drivers/gpu/drm/i915/i915_drv.h  |  71 +++-
 drivers/gpu/drm/i915/i915_perf.c | 344 ---
 2 files changed, 281 insertions(+), 134 deletions(-)

-- 
2.11.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 4/5] drm/i915/perf: no head/tail ref in gen7_oa_read

2017-02-13 Thread Robert Bragg
This avoids redundantly passing an (inout) head and tail pointer to
gen7_append_oa_reports() from gen7_oa_read which doesn't need to
reference either itself.

Moving the head/tail reads and writes into gen7_append_oa_reports should
have no functional effect except to avoid some redundant head pointer
writes in cases where nothing was copied to userspace.

This is a stepping stone towards updating how the head and tail pointer
state is managed to improve the workaround for the OA unit's tail
pointer race. It reduces the number of places we need to read/write the
head and tail pointers.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 51 +++-
 1 file changed, 19 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e85583d0bcff..4b3babdbd79e 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -420,8 +420,6 @@ static int append_oa_sample(struct i915_perf_stream *stream,
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
- * @head_ptr: (inout): the current oa buffer cpu read position
- * @tail: the current oa buffer gpu write position
  *
  * Notably any error condition resulting in a short read (-%ENOSPC or
  * -%EFAULT) will be returned even though one or more records may
@@ -439,9 +437,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 static int gen7_append_oa_reports(struct i915_perf_stream *stream,
  char __user *buf,
  size_t count,
- size_t *offset,
- u32 *head_ptr,
- u32 tail)
+ size_t *offset)
 {
struct drm_i915_private *dev_priv = stream->dev_priv;
int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -449,14 +445,15 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
int tail_margin = dev_priv->perf.oa.tail_margin;
u32 gtt_offset = i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma);
u32 mask = (OA_BUFFER_SIZE - 1);
-   u32 head;
+   size_t start_offset = *offset;
+   u32 head, oastatus1, tail;
u32 taken;
int ret = 0;
 
if (WARN_ON(!stream->enabled))
return -EIO;
 
-   head = *head_ptr - gtt_offset;
+   head = dev_priv->perf.oa.oa_buffer.head - gtt_offset;
 
/* An out of bounds or misaligned head pointer implies a driver bug
 * since we are in full control of head pointer which should only
@@ -467,7 +464,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
  "Inconsistent OA buffer head pointer = %u\n", head))
return -EIO;
 
-   tail -= gtt_offset;
+   oastatus1 = I915_READ(GEN7_OASTATUS1);
+   tail = (oastatus1 & GEN7_OASTATUS1_TAIL_MASK) - gtt_offset;
 
/* The OA unit is expected to wrap the tail pointer according to the OA
 * buffer size
@@ -477,8 +475,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
  tail);
dev_priv->perf.oa.ops.oa_disable(dev_priv);
dev_priv->perf.oa.ops.oa_enable(dev_priv);
-   *head_ptr = I915_READ(GEN7_OASTATUS2) &
-   GEN7_OASTATUS2_HEAD_MASK;
return -EIO;
}
 
@@ -542,7 +538,18 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
report32[0] = 0;
}
 
-   *head_ptr = gtt_offset + head;
+
+   if (start_offset != *offset) {
+   /* We removed the gtt_offset for the copy loop above, indexing
+* relative to oa_buf_base so put back here...
+*/
+   head += gtt_offset;
+
+   I915_WRITE(GEN7_OASTATUS2,
+  ((head & GEN7_OASTATUS2_HEAD_MASK) |
+   OA_MEM_SELECT_GGTT));
+   dev_priv->perf.oa.oa_buffer.head = head;
+   }
 
return ret;
 }
@@ -570,8 +577,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 {
struct drm_i915_private *dev_priv = stream->dev_priv;
u32 oastatus1;
-   u32 head;
-   u32 tail;
int ret;
 
if (WARN_ON(!dev_priv->perf.oa.oa_buffer.vaddr))
@@ -579,9 +584,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 
oastatus1 = I915_READ(GEN7_OASTATUS1);
 
-   head = dev_priv->perf.oa.oa_buffer.head;
-   tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
-
/* XXX: On Haswell we don't have a safe way to clear oastatus1
 

[Intel-gfx] [PATCH v2 5/5] drm/i915/perf: improve tail race workaround

2017-02-13 Thread Robert Bragg
There's a HW race condition between OA unit tail pointer register
updates and writes to memory whereby the tail pointer can sometimes get
ahead of what's been written out to the OA buffer so far (in terms of
what's visible to the CPU).

Although this can be observed explicitly while copying reports to
userspace by checking for a zeroed report-id field in tail reports, we
want to account for this earlier, as part of the _oa_buffer_check to
avoid lots of redundant read() attempts.

Previously the driver used to define an effective tail pointer that
lagged the real pointer by a 'tail margin' measured in bytes derived
from OA_TAIL_MARGIN_NSEC and the configured sampling frequency.
Unfortunately this was flawed considering that the OA unit may also
automatically generate non-periodic reports (such as on context switch)
or the OA unit may be enabled without any periodic sampling.

This improves how we define a tail pointer for reading that lags the
real tail pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which
gives enough time for the corresponding reports to become visible to the
CPU.

The driver now maintains two tail pointers:
 1) An 'aging' tail with an associated timestamp that is tracked until we
can trust the corresponding data is visible to the CPU; at which point
it is considered 'aged'.
 2) An 'aged' tail that can be used for read()ing.

The two separate pointers let us decouple read()s from tail pointer aging.

The tail pointers are checked and updated at a limited rate within a
hrtimer callback (the same callback that is used for delivering POLLIN
events) and since we're now measuring the wall clock time elapsed since
a given tail pointer was read the mechanism no longer cares about
the OA unit's periodic sampling frequency.

The natural place to handle the tail pointer updates was in
gen7_oa_buffer_is_empty() which is called as part of blocking reads and
the hrtimer callback used for polling, and so this was renamed to
oa_buffer_check() considering the added side effect while checking
whether the buffer contains data.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  60 -
 drivers/gpu/drm/i915/i915_perf.c | 277 ++-
 2 files changed, 241 insertions(+), 96 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b990549cfe65..dfd4d9b299b7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2051,7 +2051,7 @@ struct i915_oa_ops {
size_t *offset);
 
/**
-* @oa_buffer_is_empty: Check if OA buffer empty (false positives OK)
+* @oa_buffer_check: Check for OA buffer data + update tail
 *
 * This is either called via fops or the poll check hrtimer (atomic
 * ctx) without any locks taken.
@@ -2064,7 +2064,7 @@ struct i915_oa_ops {
 * here, which will be handled gracefully - likely resulting in an
 * %EAGAIN error for userspace.
 */
-   bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
+   bool (*oa_buffer_check)(struct drm_i915_private *dev_priv);
 };
 
 struct intel_cdclk_state {
@@ -2412,9 +2412,6 @@ struct drm_i915_private {
 
bool periodic;
int period_exponent;
-   int timestamp_frequency;
-
-   int tail_margin;
 
int metrics_set;
 
@@ -2430,6 +2427,59 @@ struct drm_i915_private {
int format_size;
 
/**
+* Locks reads and writes to all head/tail state
+*
+* Consider: the head and tail pointer state
+* needs to be read consistently from a hrtimer
+* callback (atomic context) and read() fop
+* (user context) with tail pointer updates
+* happening in atomic context and head updates
+* in user context and the (unlikely)
+* possibility of read() errors needing to
+* reset all head/tail state.
+*
+* Note: Contention or performance aren't
+* currently a significant concern here
+* considering the relatively low frequency of
+* hrtimer callbacks (5ms period) and that
+* reads typically only happen in response to a
+* hrtimer event and likely complete before the
+* n

[Intel-gfx] [PATCH v2 2/5] drm/i915/perf: avoid poll, read, EAGAIN busy loops

2017-02-13 Thread Robert Bragg
If the function for checking whether there is OA buffer data available
(during a poll or blocking read) has false positives then we want to
avoid a situation where the subsequent read() returns EAGAIN (after
a more accurate check) followed by a poll() immediately reporting
the same false positive POLLIN event and effectively maintaining a
busy loop until there really is data.

This makes sure that we clear the .pollin event status whenever we
return EAGAIN to userspace which will throttle subsequent POLLIN events
and repeated attempts to read to the 5ms intervals of the hrtimer
callback we have.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index b0eec762b9b4..4bb7333dac45 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1352,7 +1352,15 @@ static ssize_t i915_perf_read(struct file *file,
mutex_unlock(_priv->perf.lock);
}
 
-   if (ret >= 0) {
+   /* We allow the poll checking to sometimes report false positive POLLIN
+* events where we might actually report EAGAIN on read() if there's
+* not really any data available. In this situation though we don't
+* want to enter a busy loop between poll() reporting a POLLIN event
+* and read() returning -EAGAIN. Clearing the oa.pollin state here
+* effectively ensures we back off until the next hrtimer callback
+* before reporting another POLLIN event.
+*/
+   if (ret >= 0 || ret == -EAGAIN) {
/* Maybe make ->pollin per-stream state if we support multiple
 * concurrent streams in the future.
 */
-- 
2.11.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 1/5] drm/i915/perf: fix gen7_append_oa_reports comment

2017-02-13 Thread Robert Bragg
If I'm going to complain about a back-to-front convention then the least
I can do is not muddle the comment up too.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index a1b7eec58be2..b0eec762b9b4 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -431,7 +431,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
  * userspace.
  *
  * Note: reports are consumed from the head, and appended to the
- * tail, so the head chases the tail?... If you think that's mad
+ * tail, so the tail chases the head?... If you think that's mad
  * and back-to-front you're not alone, but this follows the
  * Gen PRM naming convention.
  *
-- 
2.11.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2] drm/i915: fix for WaDisableDopClockGating:bdw

2017-02-12 Thread Robert Bragg
This workaround for BDW was incomplete as it also requires EUTC clock
gating to be disabled via UCGCTL1.

v2: read modify write UCGTL1 in broadwell_init_clock_gating (Ville)

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Cc: Ville Syrjälä <ville.syrj...@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c | 8 
 drivers/gpu/drm/i915/intel_ringbuffer.c | 6 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index c0b0f5a4b9f1..3c13be8985f1 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7229,6 +7229,14 @@ static void broadwell_init_clock_gating(struct 
drm_i915_private *dev_priv)
   | KVM_CONFIG_CHANGE_NOTIFICATION_SELECT);
 
lpt_init_clock_gating(dev_priv);
+
+   /* WaDisableDopClockGating:bdw
+*
+* Also see the CHICKEN2 write in bdw_init_workarounds() to disable DOP
+* clock gating.
+*/
+   I915_WRITE(GEN6_UCGCTL1,
+  I915_READ(GEN6_UCGCTL1) | GEN6_EU_TCUNIT_CLOCK_GATE_DISABLE);
 }
 
 static void haswell_init_clock_gating(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index d3d1e64f2498..d93d5f8f02d8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -825,7 +825,11 @@ static int bdw_init_workarounds(struct intel_engine_cs 
*engine)
/* WaDisableThreadStallDopClockGating:bdw (pre-production) */
WA_SET_BIT_MASKED(GEN8_ROW_CHICKEN, STALL_DOP_GATING_DISABLE);
 
-   /* WaDisableDopClockGating:bdw */
+   /* WaDisableDopClockGating:bdw
+*
+* Also see the related UCGTCL1 write in broadwell_init_clock_gating()
+* to disable EUTC clock gating.
+*/
WA_SET_BIT_MASKED(GEN7_ROW_CHICKEN2,
  DOP_CLOCK_GATING_DISABLE);
 
-- 
2.11.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: fix for WaDisableDopClockGating:bdw

2017-02-12 Thread Robert Bragg
On Wed, Feb 8, 2017 at 6:33 PM, Ville Syrjälä
<ville.syrj...@linux.intel.com> wrote:
> On Wed, Feb 08, 2017 at 06:10:31PM +0000, Robert Bragg wrote:
>> This workaround for BDW was incomplete as it also requires EUTC clock
>> gating to be disabled via UCGCTL1.
>
> IIRC that matches what I told Ben years ago when the w/a was first
> being added, and matches what I put in the CHV code when it still
> had this w/a. Presumably it still holds for BDW.

Poking into this I see now that 3e470eaaee5 which removes the
corresponding, pre-production, w/a for CHV removed a write to UCGCTL1
which is missing for BDW.

>
>>
>> Signed-off-by: Robert Bragg <rob...@sixbynine.org>
>> ---
>>  drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 49fa8006c6a2..fa1b400a79d0 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -776,6 +776,7 @@ static int bdw_init_workarounds(struct intel_engine_cs 
>> *engine)
>>   /* WaDisableDopClockGating:bdw */
>>   WA_SET_BIT_MASKED(GEN7_ROW_CHICKEN2,
>> DOP_CLOCK_GATING_DISABLE);
>> + WA_SET_BIT(GEN6_UCGCTL1, GEN6_EU_TCUNIT_CLOCK_GATE_DISABLE);
>
> The UCGCTL registers aren't clobbered by GPU resets and whatnot, so
> we've historically kept them in the init_clock_gating() side.

Okey, I'll move here.

>
> Also it's not a masked register, and I'm not sure not all the bits
> default to 0, so you shouldn't perhaps just clobber them. Sadly the
> spec seems to have gone mad and no longer shows the default values
> for registers so I can't double check right now.

Ah, oops, I had a read modify write in i915-perf previously but yeah
this is no good. Moving this to broadwell_init_clock_gating() as a
read modify write is hopefully ok.

Thanks for the notes.
- Robert

>
>>
>>   WA_SET_BIT_MASKED(HALF_SLICE_CHICKEN3,
>> GEN8_SAMPLER_POWER_BYPASS_DIS);
>> --
>> 2.11.0
>>
>> ___
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Ville Syrjälä
> Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: fix for WaDisableDopClockGating:bdw

2017-02-08 Thread Robert Bragg
This workaround for BDW was incomplete as it also requires EUTC clock
gating to be disabled via UCGCTL1.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 49fa8006c6a2..fa1b400a79d0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -776,6 +776,7 @@ static int bdw_init_workarounds(struct intel_engine_cs 
*engine)
/* WaDisableDopClockGating:bdw */
WA_SET_BIT_MASKED(GEN7_ROW_CHICKEN2,
  DOP_CLOCK_GATING_DISABLE);
+   WA_SET_BIT(GEN6_UCGCTL1, GEN6_EU_TCUNIT_CLOCK_GATE_DISABLE);
 
WA_SET_BIT_MASKED(HALF_SLICE_CHICKEN3,
  GEN8_SAMPLER_POWER_BYPASS_DIS);
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 2/5] drm/i915/perf: avoid poll, read, EAGAIN busy loops

2017-01-27 Thread Robert Bragg
If the function for checking whether there is OA buffer data available
(during a poll or blocking read) has false positives then we want to
avoid a situation where the subsequent read() returns EAGAIN (after
a more accurate check) followed by a poll() immediately reporting
the same false positive POLLIN event and effectively maintaining a
busy loop until there really is data.

This makes sure that we clear the .pollin event status whenever we
return EAGAIN to userspace which will throttle subsequent POLLIN events
and repeated attempts to read to the 5ms intervals of the hrtimer
callback we have.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index b0eec762b9b4..4bb7333dac45 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1352,7 +1352,15 @@ static ssize_t i915_perf_read(struct file *file,
mutex_unlock(_priv->perf.lock);
}
 
-   if (ret >= 0) {
+   /* We allow the poll checking to sometimes report false positive POLLIN
+* events where we might actually report EAGAIN on read() if there's
+* not really any data available. In this situation though we don't
+* want to enter a busy loop between poll() reporting a POLLIN event
+* and read() returning -EAGAIN. Clearing the oa.pollin state here
+* effectively ensures we back off until the next hrtimer callback
+* before reporting another POLLIN event.
+*/
+   if (ret >= 0 || ret == -EAGAIN) {
/* Maybe make ->pollin per-stream state if we support multiple
 * concurrent streams in the future.
 */
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 3/5] drm/i915/perf: avoid read back of head register

2017-01-27 Thread Robert Bragg
There's no need for the driver to keep reading back the head pointer
from hardware since the hardware doesn't update it automatically. This
way we can treat any invalid head pointer value as a software/driver
bug instead of spurious hardware behaviour.

This change is also a small stepping stone towards re-working how
the head and tail state is managed as part of an improved workaround
for the tail register race condition.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  | 11 ++
 drivers/gpu/drm/i915/i915_perf.c | 46 ++--
 2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 38509505424d..a7e8936ce0b0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2397,6 +2397,17 @@ struct drm_i915_private {
u8 *vaddr;
int format;
int format_size;
+
+   /**
+* Although we can always read back the head
+* pointer register, we prefer to avoid
+* trusting the HW state, just to avoid any
+* risk that some hardware condition could
+* somehow bump the head pointer unpredictably
+* and cause us to forward the wrong OA buffer
+* data to userspace.
+*/
+   u32 head;
} oa_buffer;
 
u32 gen7_latched_oastatus1;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 4bb7333dac45..e85583d0bcff 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -322,9 +322,8 @@ struct perf_open_properties {
 static bool gen7_oa_buffer_is_empty_fop_unlocked(struct drm_i915_private 
*dev_priv)
 {
int report_size = dev_priv->perf.oa.oa_buffer.format_size;
-   u32 oastatus2 = I915_READ(GEN7_OASTATUS2);
u32 oastatus1 = I915_READ(GEN7_OASTATUS1);
-   u32 head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK;
+   u32 head = dev_priv->perf.oa.oa_buffer.head;
u32 tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
 
return OA_TAKEN(tail, head) <
@@ -458,16 +457,24 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
return -EIO;
 
head = *head_ptr - gtt_offset;
+
+   /* An out of bounds or misaligned head pointer implies a driver bug
+* since we are in full control of head pointer which should only
+* be incremented by multiples of the report size (notably also
+* all a power of two).
+*/
+   if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size,
+ "Inconsistent OA buffer head pointer = %u\n", head))
+   return -EIO;
+
tail -= gtt_offset;
 
/* The OA unit is expected to wrap the tail pointer according to the OA
-* buffer size and since we should never write a misaligned head
-* pointer we don't expect to read one back either...
+* buffer size
 */
-   if (tail > OA_BUFFER_SIZE || head > OA_BUFFER_SIZE ||
-   head % report_size) {
-   DRM_ERROR("Inconsistent OA buffer pointer (head = %u, tail = 
%u): force restart\n",
- head, tail);
+   if (tail > OA_BUFFER_SIZE) {
+   DRM_ERROR("Inconsistent OA buffer tail pointer = %u: force 
restart\n",
+ tail);
dev_priv->perf.oa.ops.oa_disable(dev_priv);
dev_priv->perf.oa.ops.oa_enable(dev_priv);
*head_ptr = I915_READ(GEN7_OASTATUS2) &
@@ -562,8 +569,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
size_t *offset)
 {
struct drm_i915_private *dev_priv = stream->dev_priv;
-   int report_size = dev_priv->perf.oa.oa_buffer.format_size;
-   u32 oastatus2;
u32 oastatus1;
u32 head;
u32 tail;
@@ -572,10 +577,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
if (WARN_ON(!dev_priv->perf.oa.oa_buffer.vaddr))
return -EIO;
 
-   oastatus2 = I915_READ(GEN7_OASTATUS2);
oastatus1 = I915_READ(GEN7_OASTATUS1);
 
-   head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK;
+   head = dev_priv->perf.oa.oa_buffer.head;
tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
 
/* XXX: On Haswell we don't have a safe way to clear oastatus1
@@ -616,10 +620,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
   

[Intel-gfx] [PATCH v2 5/5] drm/i915/perf: improve tail race workaround

2017-01-27 Thread Robert Bragg
There's a HW race condition between OA unit tail pointer register
updates and writes to memory whereby the tail pointer can sometimes get
ahead of what's been written out to the OA buffer so far (in terms of
what's visible to the CPU).

Although this can be observed explicitly while copying reports to
userspace by checking for a zeroed report-id field in tail reports, we
want to account for this earlier, as part of the _oa_buffer_check to
avoid lots of redundant read() attempts.

Previously the driver used to define an effective tail pointer that
lagged the real pointer by a 'tail margin' measured in bytes derived
from OA_TAIL_MARGIN_NSEC and the configured sampling frequency.
Unfortunately this was flawed considering that the OA unit may also
automatically generate non-periodic reports (such as on context switch)
or the OA unit may be enabled without any periodic sampling.

This improves how we define a tail pointer for reading that lags the
real tail pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which
gives enough time for the corresponding reports to become visible to the
CPU.

The driver now maintains two tail pointers:
 1) An 'aging' tail with an associated timestamp that is tracked until we
can trust the corresponding data is visible to the CPU; at which point
it is considered 'aged'.
 2) An 'aged' tail that can be used for read()ing.

The two separate pointers let us decouple read()s from tail pointer aging.

The tail pointers are checked and updated at a limited rate within a
hrtimer callback (the same callback that is used for delivering POLLIN
events) and since we're now measuring the wall clock time elapsed since
a given tail pointer was read the mechanism no longer cares about
the OA unit's periodic sampling frequency.

The natural place to handle the tail pointer updates was in
gen7_oa_buffer_is_empty() which is called as part of blocking reads and
the hrtimer callback used for polling, and so this was renamed to
oa_buffer_check() considering the added side effect while checking
whether the buffer contains data.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  60 -
 drivers/gpu/drm/i915/i915_perf.c | 277 ++-
 2 files changed, 241 insertions(+), 96 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a7e8936ce0b0..f34b7f5022fc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2038,7 +2038,7 @@ struct i915_oa_ops {
size_t *offset);
 
/**
-* @oa_buffer_is_empty: Check if OA buffer empty (false positives OK)
+* @oa_buffer_check: Check for OA buffer data + update tail
 *
 * This is either called via fops or the poll check hrtimer (atomic
 * ctx) without any locks taken.
@@ -2051,7 +2051,7 @@ struct i915_oa_ops {
 * here, which will be handled gracefully - likely resulting in an
 * %EAGAIN error for userspace.
 */
-   bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
+   bool (*oa_buffer_check)(struct drm_i915_private *dev_priv);
 };
 
 struct drm_i915_private {
@@ -2381,9 +2381,6 @@ struct drm_i915_private {
 
bool periodic;
int period_exponent;
-   int timestamp_frequency;
-
-   int tail_margin;
 
int metrics_set;
 
@@ -2399,6 +2396,59 @@ struct drm_i915_private {
int format_size;
 
/**
+* Locks reads and writes to all head/tail state
+*
+* Consider: the head and tail pointer state
+* needs to be read consistently from a hrtimer
+* callback (atomic context) and read() fop
+* (user context) with tail pointer updates
+* happening in atomic context and head updates
+* in user context and the (unlikely)
+* possibility of read() errors needing to
+* reset all head/tail state.
+*
+* Note: Contention or performance aren't
+* currently a significant concern here
+* considering the relatively low frequency of
+* hrtimer callbacks (5ms period) and that
+* reads typically only happen in response to a
+* hrtimer event and likely complete before the
+* n

[Intel-gfx] [PATCH v2 4/5] drm/i915/perf: no head/tail ref in gen7_oa_read

2017-01-27 Thread Robert Bragg
This avoids redundantly passing an (inout) head and tail pointer to
gen7_append_oa_reports() from gen7_oa_read which doesn't need to
reference either itself.

Moving the head/tail reads and writes into gen7_append_oa_reports should
have no functional effect except to avoid some redundant head pointer
writes in cases where nothing was copied to userspace.

This is a stepping stone towards updating how the head and tail pointer
state is managed to improve the workaround for the OA unit's tail
pointer race. It reduces the number of places we need to read/write the
head and tail pointers.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 51 +++-
 1 file changed, 19 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e85583d0bcff..4b3babdbd79e 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -420,8 +420,6 @@ static int append_oa_sample(struct i915_perf_stream *stream,
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
- * @head_ptr: (inout): the current oa buffer cpu read position
- * @tail: the current oa buffer gpu write position
  *
  * Notably any error condition resulting in a short read (-%ENOSPC or
  * -%EFAULT) will be returned even though one or more records may
@@ -439,9 +437,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 static int gen7_append_oa_reports(struct i915_perf_stream *stream,
  char __user *buf,
  size_t count,
- size_t *offset,
- u32 *head_ptr,
- u32 tail)
+ size_t *offset)
 {
struct drm_i915_private *dev_priv = stream->dev_priv;
int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -449,14 +445,15 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
int tail_margin = dev_priv->perf.oa.tail_margin;
u32 gtt_offset = i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma);
u32 mask = (OA_BUFFER_SIZE - 1);
-   u32 head;
+   size_t start_offset = *offset;
+   u32 head, oastatus1, tail;
u32 taken;
int ret = 0;
 
if (WARN_ON(!stream->enabled))
return -EIO;
 
-   head = *head_ptr - gtt_offset;
+   head = dev_priv->perf.oa.oa_buffer.head - gtt_offset;
 
/* An out of bounds or misaligned head pointer implies a driver bug
 * since we are in full control of head pointer which should only
@@ -467,7 +464,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
  "Inconsistent OA buffer head pointer = %u\n", head))
return -EIO;
 
-   tail -= gtt_offset;
+   oastatus1 = I915_READ(GEN7_OASTATUS1);
+   tail = (oastatus1 & GEN7_OASTATUS1_TAIL_MASK) - gtt_offset;
 
/* The OA unit is expected to wrap the tail pointer according to the OA
 * buffer size
@@ -477,8 +475,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
  tail);
dev_priv->perf.oa.ops.oa_disable(dev_priv);
dev_priv->perf.oa.ops.oa_enable(dev_priv);
-   *head_ptr = I915_READ(GEN7_OASTATUS2) &
-   GEN7_OASTATUS2_HEAD_MASK;
return -EIO;
}
 
@@ -542,7 +538,18 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
report32[0] = 0;
}
 
-   *head_ptr = gtt_offset + head;
+
+   if (start_offset != *offset) {
+   /* We removed the gtt_offset for the copy loop above, indexing
+* relative to oa_buf_base so put back here...
+*/
+   head += gtt_offset;
+
+   I915_WRITE(GEN7_OASTATUS2,
+  ((head & GEN7_OASTATUS2_HEAD_MASK) |
+   OA_MEM_SELECT_GGTT));
+   dev_priv->perf.oa.oa_buffer.head = head;
+   }
 
return ret;
 }
@@ -570,8 +577,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 {
struct drm_i915_private *dev_priv = stream->dev_priv;
u32 oastatus1;
-   u32 head;
-   u32 tail;
int ret;
 
if (WARN_ON(!dev_priv->perf.oa.oa_buffer.vaddr))
@@ -579,9 +584,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 
oastatus1 = I915_READ(GEN7_OASTATUS1);
 
-   head = dev_priv->perf.oa.oa_buffer.head;
-   tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
-
/* XXX: On Haswell we don't have a safe way to clear oastatus1
 

[Intel-gfx] [PATCH v2 1/5] drm/i915/perf: fix gen7_append_oa_reports comment

2017-01-27 Thread Robert Bragg
If I'm going to complain about a back-to-front convention then the least
I can do is not muddle the comment up too.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index a1b7eec58be2..b0eec762b9b4 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -431,7 +431,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
  * userspace.
  *
  * Note: reports are consumed from the head, and appended to the
- * tail, so the head chases the tail?... If you think that's mad
+ * tail, so the tail chases the head?... If you think that's mad
  * and back-to-front you're not alone, but this follows the
  * Gen PRM naming convention.
  *
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 0/5] drm/i915/perf: Improve handling of OA tail race

2017-01-27 Thread Robert Bragg
Folds in Matthew Auld's feedback; thanks.

Robert Bragg (5):
  drm/i915/perf: fix gen7_append_oa_reports comment
  drm/i915/perf: avoid poll, read, EAGAIN busy loops
  drm/i915/perf: avoid read back of head register
  drm/i915/perf: no head/tail ref in gen7_oa_read
  drm/i915/perf: improve tail race workaround

 drivers/gpu/drm/i915/i915_drv.h  |  71 +++-
 drivers/gpu/drm/i915/i915_perf.c | 344 ---
 2 files changed, 281 insertions(+), 134 deletions(-)

-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 3/5] drm/i915/perf: avoid read back of head register

2017-01-23 Thread Robert Bragg
There's no need for the driver to keep reading back the head pointer
from hardware since the hardware doesn't update it automatically. This
way we can treat any invalid head pointer value as a software/driver
bug instead of spurious hardware behaviour.

This change is also a small stepping stone towards re-working how
the head and tail state is managed as part of an improved workaround
for the tail register race condition.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.h  | 11 ++
 drivers/gpu/drm/i915/i915_perf.c | 46 ++--
 2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 38509505424d..e732d0b3bf65 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2397,6 +2397,17 @@ struct drm_i915_private {
u8 *vaddr;
int format;
int format_size;
+
+   /**
+* Although we can always read back the head
+* pointer register, we prefer to avoid
+* trusting the HW state, just to avoid any
+* risk that some hardware condition could
+* somehow bump the head pointer unpredictably
+* and cause us to forward the wrong OA buffer
+* data to uesrspace.
+*/
+   u32 head;
} oa_buffer;
 
u32 gen7_latched_oastatus1;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 4bb7333dac45..e85583d0bcff 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -322,9 +322,8 @@ struct perf_open_properties {
 static bool gen7_oa_buffer_is_empty_fop_unlocked(struct drm_i915_private 
*dev_priv)
 {
int report_size = dev_priv->perf.oa.oa_buffer.format_size;
-   u32 oastatus2 = I915_READ(GEN7_OASTATUS2);
u32 oastatus1 = I915_READ(GEN7_OASTATUS1);
-   u32 head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK;
+   u32 head = dev_priv->perf.oa.oa_buffer.head;
u32 tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
 
return OA_TAKEN(tail, head) <
@@ -458,16 +457,24 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
return -EIO;
 
head = *head_ptr - gtt_offset;
+
+   /* An out of bounds or misaligned head pointer implies a driver bug
+* since we are in full control of head pointer which should only
+* be incremented by multiples of the report size (notably also
+* all a power of two).
+*/
+   if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size,
+ "Inconsistent OA buffer head pointer = %u\n", head))
+   return -EIO;
+
tail -= gtt_offset;
 
/* The OA unit is expected to wrap the tail pointer according to the OA
-* buffer size and since we should never write a misaligned head
-* pointer we don't expect to read one back either...
+* buffer size
 */
-   if (tail > OA_BUFFER_SIZE || head > OA_BUFFER_SIZE ||
-   head % report_size) {
-   DRM_ERROR("Inconsistent OA buffer pointer (head = %u, tail = 
%u): force restart\n",
- head, tail);
+   if (tail > OA_BUFFER_SIZE) {
+   DRM_ERROR("Inconsistent OA buffer tail pointer = %u: force 
restart\n",
+ tail);
dev_priv->perf.oa.ops.oa_disable(dev_priv);
dev_priv->perf.oa.ops.oa_enable(dev_priv);
*head_ptr = I915_READ(GEN7_OASTATUS2) &
@@ -562,8 +569,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
size_t *offset)
 {
struct drm_i915_private *dev_priv = stream->dev_priv;
-   int report_size = dev_priv->perf.oa.oa_buffer.format_size;
-   u32 oastatus2;
u32 oastatus1;
u32 head;
u32 tail;
@@ -572,10 +577,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
if (WARN_ON(!dev_priv->perf.oa.oa_buffer.vaddr))
return -EIO;
 
-   oastatus2 = I915_READ(GEN7_OASTATUS2);
oastatus1 = I915_READ(GEN7_OASTATUS1);
 
-   head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK;
+   head = dev_priv->perf.oa.oa_buffer.head;
tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
 
/* XXX: On Haswell we don't have a safe way to clear oastatus1
@@ -616,10 +620,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
dev_priv->perf.oa.ops.oa_disable(dev_priv);

[Intel-gfx] [PATCH 4/5] drm/i915/perf: no head/tail ref in gen7_oa_read

2017-01-23 Thread Robert Bragg
This avoids redundantly passing an (inout) head and tail pointer to
gen7_append_oa_reports() from gen7_oa_read which doesn't need to
reference either itself.

Moving the head/tail reads and writes into gen7_append_oa_reports should
have no functional effect except to avoid some redundant head pointer
writes in cases where nothing was copied to userspace.

This is a stepping stone towards updating how the head and tail pointer
state is managed to improve the workaround for the OA unit's tail
pointer race. It reduces the number of places we need to read/write the
head and tail pointers.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_perf.c | 51 +++-
 1 file changed, 19 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e85583d0bcff..4b3babdbd79e 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -420,8 +420,6 @@ static int append_oa_sample(struct i915_perf_stream *stream,
  * @buf: destination buffer given by userspace
  * @count: the number of bytes userspace wants to read
  * @offset: (inout): the current position for writing into @buf
- * @head_ptr: (inout): the current oa buffer cpu read position
- * @tail: the current oa buffer gpu write position
  *
  * Notably any error condition resulting in a short read (-%ENOSPC or
  * -%EFAULT) will be returned even though one or more records may
@@ -439,9 +437,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 static int gen7_append_oa_reports(struct i915_perf_stream *stream,
  char __user *buf,
  size_t count,
- size_t *offset,
- u32 *head_ptr,
- u32 tail)
+ size_t *offset)
 {
struct drm_i915_private *dev_priv = stream->dev_priv;
int report_size = dev_priv->perf.oa.oa_buffer.format_size;
@@ -449,14 +445,15 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
int tail_margin = dev_priv->perf.oa.tail_margin;
u32 gtt_offset = i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma);
u32 mask = (OA_BUFFER_SIZE - 1);
-   u32 head;
+   size_t start_offset = *offset;
+   u32 head, oastatus1, tail;
u32 taken;
int ret = 0;
 
if (WARN_ON(!stream->enabled))
return -EIO;
 
-   head = *head_ptr - gtt_offset;
+   head = dev_priv->perf.oa.oa_buffer.head - gtt_offset;
 
/* An out of bounds or misaligned head pointer implies a driver bug
 * since we are in full control of head pointer which should only
@@ -467,7 +464,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
  "Inconsistent OA buffer head pointer = %u\n", head))
return -EIO;
 
-   tail -= gtt_offset;
+   oastatus1 = I915_READ(GEN7_OASTATUS1);
+   tail = (oastatus1 & GEN7_OASTATUS1_TAIL_MASK) - gtt_offset;
 
/* The OA unit is expected to wrap the tail pointer according to the OA
 * buffer size
@@ -477,8 +475,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
  tail);
dev_priv->perf.oa.ops.oa_disable(dev_priv);
dev_priv->perf.oa.ops.oa_enable(dev_priv);
-   *head_ptr = I915_READ(GEN7_OASTATUS2) &
-   GEN7_OASTATUS2_HEAD_MASK;
return -EIO;
}
 
@@ -542,7 +538,18 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
report32[0] = 0;
}
 
-   *head_ptr = gtt_offset + head;
+
+   if (start_offset != *offset) {
+   /* We removed the gtt_offset for the copy loop above, indexing
+* relative to oa_buf_base so put back here...
+*/
+   head += gtt_offset;
+
+   I915_WRITE(GEN7_OASTATUS2,
+  ((head & GEN7_OASTATUS2_HEAD_MASK) |
+   OA_MEM_SELECT_GGTT));
+   dev_priv->perf.oa.oa_buffer.head = head;
+   }
 
return ret;
 }
@@ -570,8 +577,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 {
struct drm_i915_private *dev_priv = stream->dev_priv;
u32 oastatus1;
-   u32 head;
-   u32 tail;
int ret;
 
if (WARN_ON(!dev_priv->perf.oa.oa_buffer.vaddr))
@@ -579,9 +584,6 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 
oastatus1 = I915_READ(GEN7_OASTATUS1);
 
-   head = dev_priv->perf.oa.oa_buffer.head;
-   tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
-
/* XXX: On Haswell we don't have a safe way to clear oastatus1
 * bits while the OA unit is enabled (while the tail pointe

[Intel-gfx] [PATCH 2/5] drm/i915/perf: avoid poll, read, EAGAIN busy loops

2017-01-23 Thread Robert Bragg
If the function for checking whether there is OA buffer data available
(during a poll or blocking read) has false positives then we want to
avoid a situation where the subsequent read() returns EAGAIN (after
a more accurate check) followed by a poll() immediately reporting
the same false positive POLLIN event and effectively maintaining a
busy loop until there really is data.

This makes sure that we clear the .pollin event status whenever we
return EAGAIN to userspace which will throttle subsequent POLLIN events
and repeated attempts to read to the 5ms intervals of the hrtimer
callback we have.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_perf.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index b0eec762b9b4..4bb7333dac45 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1352,7 +1352,15 @@ static ssize_t i915_perf_read(struct file *file,
mutex_unlock(_priv->perf.lock);
}
 
-   if (ret >= 0) {
+   /* We allow the poll checking to sometimes report false positive POLLIN
+* events where we might actually report EAGAIN on read() if there's
+* not really any data available. In this situation though we don't
+* want to enter a busy loop between poll() reporting a POLLIN event
+* and read() returning -EAGAIN. Clearing the oa.pollin state here
+* effectively ensures we back off until the next hrtimer callback
+* before reporting another POLLIN event.
+*/
+   if (ret >= 0 || ret == -EAGAIN) {
/* Maybe make ->pollin per-stream state if we support multiple
 * concurrent streams in the future.
 */
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/5] drm/i915/perf: fix gen7_append_oa_reports comment

2017-01-23 Thread Robert Bragg
If I'm going to complain about a back-to-front convention then the least
I can do is not muddle the comment up too.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index a1b7eec58be2..b0eec762b9b4 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -431,7 +431,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
  * userspace.
  *
  * Note: reports are consumed from the head, and appended to the
- * tail, so the head chases the tail?... If you think that's mad
+ * tail, so the tail chases the head?... If you think that's mad
  * and back-to-front you're not alone, but this follows the
  * Gen PRM naming convention.
  *
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 5/5] drm/i915/perf: improve tail race workaround

2017-01-23 Thread Robert Bragg
There's a HW race condition between OA unit tail pointer register
updates and writes to memory whereby the tail pointer can sometimes get
ahead of what's been written out to the OA buffer so far (in terms of
what's visible to the CPU).

Although this can be observed explicitly while copying reports to
userspace by checking for a zeroed report-id field in tail reports, we
want to account for this earlier, as part of the _oa_buffer_check to
avoid lots of redundant read() attempts.

Previously the driver used to define an effective tail pointer that
lagged the real pointer by a 'tail margin' measured in bytes derived
from OA_TAIL_MARGIN_NSEC and the configured sampling frequency.
Unfortunately this was flawed considering that the OA unit may also
automatically generate non-periodic reports (such as on context switch)
or the OA unit may be enabled without any periodic sampling.

This improves how we define a tail pointer for reading that lags the
real tail pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which
gives enough time for the corresponding reports to become visible to the
CPU.

The driver now maintains two tail pointers:
 1) An 'aging' tail with an associated timestamp that is tracked until we
can trust the corresponding data is visible to the CPU; at which point
it is considered 'aged'.
 2) An 'aged' tail that can be used for read()ing.

The two separate pointers let us decouple read()s from tail pointer aging.

The tail pointers are checked and updated at a limited rate within a
hrtimer callback (the same callback that is used for delivering POLLIN
events) and since we're now measuring the wall clock time elapsed since
a given tail pointer was read the mechanism no longer cares about
the OA unit's periodic sampling frequency.

The natural place to handle the tail pointer updates was in
gen7_oa_buffer_is_empty() which is called as part of blocking reads and
the hrtimer callback used for polling, and so this was renamed to
oa_buffer_check() considering the added side effect while checking
whether the buffer contains data.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.h  |  59 -
 drivers/gpu/drm/i915/i915_perf.c | 275 ++-
 2 files changed, 241 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e732d0b3bf65..7b2bdc6ccb26 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2038,7 +2038,7 @@ struct i915_oa_ops {
size_t *offset);
 
/**
-* @oa_buffer_is_empty: Check if OA buffer empty (false positives OK)
+* @oa_buffer_check: Check for OA buffer data + update tail
 *
 * This is either called via fops or the poll check hrtimer (atomic
 * ctx) without any locks taken.
@@ -2051,7 +2051,7 @@ struct i915_oa_ops {
 * here, which will be handled gracefully - likely resulting in an
 * %EAGAIN error for userspace.
 */
-   bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
+   bool (*oa_buffer_check)(struct drm_i915_private *dev_priv);
 };
 
 struct drm_i915_private {
@@ -2383,8 +2383,6 @@ struct drm_i915_private {
int period_exponent;
int timestamp_frequency;
 
-   int tail_margin;
-
int metrics_set;
 
const struct i915_oa_reg *mux_regs;
@@ -2399,6 +2397,59 @@ struct drm_i915_private {
int format_size;
 
/**
+* Locks reads and writes to all head/tail state
+*
+* Consider: the head and tail pointer state
+* needs to be read consistently from a hrtimer
+* callback (atomic context) and read() fop
+* (user context) with tail pointer updates
+* happening in atomic context and head updates
+* in user context and the (unlikely)
+* possibility of read() errors needing to
+* reset all head/tail state.
+*
+* Note: Contention or performance aren't
+* currently a significant concern here
+* considering the relatively low frequency of
+* hrtimer callbacks (5ms period) and that
+* reads typically only happen in response to a
+* hrtimer event and likely complete before the
+* next callback.
+*
+* Note: Thi

[Intel-gfx] [PATCH 0/5] drm/i915/perf: Improve handling of OA tail race

2017-01-23 Thread Robert Bragg
While I was updating the i915 perf patches for gen8+ I found I also needed to
add a workaround for a tail pointer race, like we see for gen7 too. I was
feeling guilty knowing there are a few limitations with the current workaround
for gen7 and so want to address those before reusing the WA for gen8+ too.

Robert Bragg (5):
  drm/i915/perf: fix gen7_append_oa_reports comment
  drm/i915/perf: avoid poll, read, EAGAIN busy loops
  drm/i915/perf: avoid read back of head register
  drm/i915/perf: no head/tail ref in gen7_oa_read
  drm/i915/perf: improve tail race workaround

 drivers/gpu/drm/i915/i915_drv.h  |  70 +++-
 drivers/gpu/drm/i915/i915_perf.c | 342 ---
 2 files changed, 281 insertions(+), 131 deletions(-)

-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH igt] igt/perf: improve robustness of polling/blocking tests

2017-01-23 Thread Robert Bragg
There were a couple of problems with both of these tests that could lead
to false negatives addressed by this patch.

1) The upper limit for the number of iterations missed a +1 to consider
   that there might be a sample immediately available at the start of the
   loop.

v2) The tests didn't consider that a duration measured in terms of
   (end-start) ticks could be +- 1 tick since we don't know the
   fractional part of the tick counts. Our threshold for stime being <
   one tick could have a false negative for any real stime between 1 to
   10 milliseconds depending on luck.

The tests now both run for a lot longer (1000 x tick duration, or
typically 10 seconds each) so that a single tick represents a much
smaller proportion of the total duration (0.1%) and the stime thresholds
are now set at 1% of the total duration.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 tests/perf.c | 140 +++
 1 file changed, 94 insertions(+), 46 deletions(-)

diff --git a/tests/perf.c b/tests/perf.c
index c9c5c57e..f3db84dd 100644
--- a/tests/perf.c
+++ b/tests/perf.c
@@ -1263,18 +1263,50 @@ test_blocking(void)
struct tms end_times;
int64_t user_ns, kernel_ns;
int64_t tick_ns = 10 / sysconf(_SC_CLK_TCK);
+   int64_t test_duration_ns = tick_ns * 1000;
+
+   /* Based on the 40ms OA sampling period set above: max OA samples: */
+   int max_iterations = (test_duration_ns / 4000ull) + 1;
+
+   /* It's a bit tricky to put a lower limit here, but we expect a
+* relatively low latency for seeing reports, while we don't currently
+* give any control over this in the api.
+*
+* We assume a maximum latency of 6 millisecond to deliver a POLLIN and
+* read() after a new sample is written (46ms per iteration) considering
+* the knowledge that that the driver uses a 200Hz hrtimer (5ms period)
+* to check for data and giving some time to read().
+*/
+   int min_iterations = (test_duration_ns / 4600ull);
+
int64_t start;
int n = 0;
 
times(_times);
 
-   /* Loop for 600ms performing blocking reads while the HW is sampling at
+   igt_debug("tick length = %dns, test duration = %"PRIu64"ns, min iter. = 
%d, max iter. = %d\n",
+ (int)tick_ns, test_duration_ns,
+ min_iterations, max_iterations);
+
+   /* In the loop we perform blocking polls while the HW is sampling at
 * ~25Hz, with the expectation that we spend most of our time blocked
 * in the kernel, and shouldn't be burning cpu cycles in the kernel in
 * association with this process (verified by looking at stime before
 * and after loop).
+*
+* We're looking to assert that less than 1% of the test duration is
+* spent in the kernel dealing with polling and read()ing.
+*
+* The test runs for a relatively long time considering the very low
+* resolution of stime in ticks of typically 10 milliseconds. Since we
+* don't know the fractional part of tick values we read from userspace
+* so our minimum threshold needs to be >= one tick since any
+* measurement might really be +- tick_ns (assuming we effectively get
+* floor(real_stime)).
+*
+* We Loop for 1000 x tick_ns so one tick corresponds to 0.1%
 */
-   for (start = get_time(); (get_time() - start) < 6; /* nop */) {
+   for (start = get_time(); (get_time() - start) < test_duration_ns; /* 
nop */) {
int ret;
 
while ((ret = read(stream_fd, buf, sizeof(buf))) < 0 &&
@@ -1294,33 +1326,26 @@ test_blocking(void)
user_ns = (end_times.tms_utime - start_times.tms_utime) * tick_ns;
kernel_ns = (end_times.tms_stime - start_times.tms_stime) * tick_ns;
 
-   igt_debug("%d blocking reads in 500 milliseconds, with 1KHz OA 
sampling\n", n);
-   igt_debug("time in userspace = %"PRIu64"ns (start utime = %d, end = %d, 
ns ticks per sec = %d)\n",
- user_ns, (int)start_times.tms_utime, 
(int)end_times.tms_utime, (int)tick_ns);
-   igt_debug("time in kernelspace = %"PRIu64"ns (start stime = %d, end = 
%d, ns ticks per sec = %d)\n",
- kernel_ns, (int)start_times.tms_stime, 
(int)end_times.tms_stime, (int)tick_ns);
+   igt_debug("%d blocking reads during test with 25Hz OA sampling\n", n);
+   igt_debug("time in userspace = %"PRIu64"ns (+-%dns) (start utime = %d, 
end = %d)\n",
+ user_ns, (int)tick_ns,
+ (int)start_times.tms_utime, (int)end_times.tms_utime);
+   igt_debug("time in kernelspace = %"PRIu64"ns (+-%dns) (start stime = 
%d, end = %d)\n",
+ kerne

Re: [Intel-gfx] [PATCH] drm/i915/perf: More documentation hooked to i915.rst

2016-12-09 Thread Robert Bragg
On Thu, Dec 8, 2016 at 3:53 PM, Daniel Vetter <dan...@ffwll.ch> wrote:
> On Wed, Dec 07, 2016 at 09:40:33PM +0000, Robert Bragg wrote:
>> This adds a 'Perf' section to i915.rst with the following sub sections:
>> - Overview
>> - Comparison with Core Perf
>> - i915 Driver Entry Points
>> - i915 Perf Stream
>> - i915 Perf Observation Architecture Stream
>> - All i915 Perf Internals
>>
>> v2:
>> section headers in i915.rst (Daniel Vetter)
>> missing symbol docs + other fixups (Matthew Auld)
>>
>> Signed-off-by: Robert Bragg <rob...@sixbynine.org>
>> Reviewed-by: Matthew Auld <matthew.a...@intel.com>
>> Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
>
> Obligatory bikeshed about explicitly listening all functions, but hey
> great docs, I'll merge ;-)

Yeah, understood.

Not sure if you saw my last reply to this, but I think listing symbols
explicitly is appropriate for documentation, where the ordering and
groupings of symbols for the sake of introducing them in logical
stages doesn't necessarily match that found in code.

Not perfect in itself but having worked with gtk-doc in the past I
thought it made sense, and worked reasonably enough, to explicitly
list section headers and symbols in a sections.txt file to control the
order things are presented to developers. Of course we sometimes
forgot to add new symbols to the sections.txt file which is a trade
off, but it never seemed like that big of an issue, compared to having
the extra editorial control over your docs.

Jani's comment was also fair about the fact that the current approach
results in lots of runs of the kernel-doc script, but that one really
seems like an implementation detail where we should avoid compromising
docs to workaround the tools (ofc just my interpretation that it would
be a compromise). I did half consider modifying kernel-doc to allow
for the -functions argument to take an ordered list instead of a set,
but couldn't quite muster the energy to modify a perl script :-p

As a bit of an asside; last year for another project of mine I once
wrote an experimental tool for extracting gtk-doc comments from code
to using the python clang bindings:
https://github.com/rib/clib/blob/master/site/rst-from-c.py

It's crossed my mind to play around with being able to extract
kernel-doc with clang along similar lines which could potentially
track more type information so the docs could support more than just
.. c:function and .. c:type, such as c:macro and c:member and could
better handle things like function pointers as members of structs.
Maybe it's an idea worth considering.


>
> Thanks, applied to dinq.

Thanks,

- Robert

> -Daniel
>
>> ---
>>  Documentation/gpu/i915.rst   |  91 +
>>  drivers/gpu/drm/i915/i915_drv.h  | 151 +++---
>>  drivers/gpu/drm/i915/i915_perf.c | 412 
>> ---
>>  3 files changed, 598 insertions(+), 56 deletions(-)
>>
>> diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
>> index 117d2ab..3843ef6 100644
>> --- a/Documentation/gpu/i915.rst
>> +++ b/Documentation/gpu/i915.rst
>> @@ -356,4 +356,95 @@ switch_mm
>>  .. kernel-doc:: drivers/gpu/drm/i915/i915_trace.h
>> :doc: switch_mm tracepoint
>>
>> +Perf
>> +
>> +
>> +Overview
>> +
>> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
>> +   :doc: i915 Perf Overview
>> +
>> +Comparison with Core Perf
>> +-
>> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
>> +   :doc: i915 Perf History and Comparison with Core Perf
>> +
>> +i915 Driver Entry Points
>> +
>> +
>> +This section covers the entrypoints exported outside of i915_perf.c to
>> +integrate with drm/i915 and to handle the `DRM_I915_PERF_OPEN` ioctl.
>> +
>> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
>> +   :functions: i915_perf_init
>> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
>> +   :functions: i915_perf_fini
>> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
>> +   :functions: i915_perf_register
>> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
>> +   :functions: i915_perf_unregister
>> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
>> +   :functions: i915_perf_open_ioctl
>> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
>> +   :functions: i915_perf_release
>> +
>> +i915 Perf Stream
>> +
>> +
>> +This section covers the stream-semantics-agnostic structures and functions
>> +for representing an i915 perf stream FD and associated file operations.
>> +
>> +.. kernel-doc:: drivers/gpu/dr

Re: [Intel-gfx] [RFC v2] drm: Enable dynamic debug for DRM_[DEV]_DEBUG*

2016-12-08 Thread Robert Bragg
On Thu, Dec 8, 2016 at 12:17 AM, Daniel Vetter <dan...@ffwll.ch> wrote:
>
> On Wed, Dec 07, 2016 at 06:35:29PM +0000, Robert Bragg wrote:
> > This is still missing corresponding documentation changes, and I haven't
> > moved anything to drm_print.h yet, as suggested.
> >
> > Sending out with a few functional improvements first to get agreement
> > before documenting anything (changes summarised in v2: section below)
> >
> > In particular, affecting the output format, I stole an idea from Tvrtko
> > Ursulin to have the prefix for messages be based on the driver name,
> > such as "[i915]" instead of always being "[drm]".
> >
> > Depending on peoples thoughts on compatibility, we could consider
> > removing the prefix given that the dynamic debug control interface has a
> > way of specifying that messages should include a module name, function
> > or line info like:
> >
> > echo "module i915 +mfp" > dynamic_debug/control
> >
> > That would enable all i915 debug messages with a module and function
> > prefix.
> >
> > A trade-off would be that anyone only using the drm.drm_debug interface
> > to control messages would loose some information. If we really wanted we
> > could have the best of both by adding a utility printing api that can
> > recognise when printing due to a dynamic debug control query vs
> > drm.drm_debug to conditionally add the prefix.
> >
> > --- >8 --- (git am --scissors)
> >
> > Dynamic debug messages (ref: Documentation/dynamic-debug-howto.txt)
> > allow fine grained control over which debug messages are enabled with
> > runtime control through /sysfs/kernel/debug/dynamic_debug/control
> >
> > This provides more control than the current drm.drm_debug parameter
> > which for some use cases is impractical to use given how chatty
> > some drm debug categories are.
> >
> > For example all debug messages in i915_drm.c can be enabled with:
> > echo "file i915_perf.c +p" > dynamic_debug/control
> >
> > This doesn't strictly maintain format compatibility with the previous
> > debug messages since the category is now added as part of the prefix
> > like "[drm][kms] No FB found". Adding the categories with a consistent
> > format makes it possible to enable categories with a dynamic debug
> > query like: echo "format [kms] +p" > dynamic_debug/control
> >
> > This maintains support for enabling debug messages using the drm_debug
> > parameter. If dynamic debug is not enabled via CONFIG_DYNAMIC_DEBUG the
> > debug messages essentially work as before, except with the inclusion of
> > categories in the format strings as described above.
> >
> > This removes the drm_[dev_]printk wrappers considering that the dynamic
> > debug macros are only useful if they can track the __FILE__, __func__
> > and __LINE__ where they are called. The wrapper didn't seem necessary in
> > the DRM_UT_NONE case with no category flag.
> >
> > The non _DEV macros are no longer defined in terms of passing NULL to a
> > _DEV variant to avoid have the core.c dev_printk implementation adding
> > "(NULL device *)". The previous drm_[dev_]prink function used to handle
> > this as a special case.
> >
> > Instead of using DRM_NAME to add [drm] to the start of every message,
> > the prefix is now based on module_name(THIS_MODULE) so it will be [drm]
> > or e.g. [i915] for the Intel driver. Later we might consider removing
> > the prefix altogether considering that the dynamic debug control
> > interface has a way of optionally adding the module, function or line to
> > the formatting of messages.
> >
> > v2:
> > Add categories to format like "[drm][kms] No FB found"
> > Only single conditional call per message (macros expand to less code)
> > Uses __dynamic_pr_debug/dev_dbg for dynamic formatting features
> > Use module name for msg prefix like [drm] or [i915]
> >
> > Signed-off-by: Robert Bragg <rob...@sixbynine.org>
> > Cc: dri-de...@lists.freedesktop.org
> > Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
> > Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
>
> So assuming I understand it correctly - I like this 3way cascade of
> dynamic debug, then printk and no_printk fallback if CONFIG_DEBUG=n for
> the space concious. But I guess we do need to add a DRM Kconfig knob to
> set DEBUG, at least I'm not entirely sure how that's supposed to work. Or
> we might need to have our own #ifdef maze for this. Maybe we need to keep

[Intel-gfx] [PATCH] drm/i915/perf: More documentation hooked to i915.rst

2016-12-07 Thread Robert Bragg
This adds a 'Perf' section to i915.rst with the following sub sections:
- Overview
- Comparison with Core Perf
- i915 Driver Entry Points
- i915 Perf Stream
- i915 Perf Observation Architecture Stream
- All i915 Perf Internals

v2:
section headers in i915.rst (Daniel Vetter)
missing symbol docs + other fixups (Matthew Auld)

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.a...@intel.com>
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
---
 Documentation/gpu/i915.rst   |  91 +
 drivers/gpu/drm/i915/i915_drv.h  | 151 +++---
 drivers/gpu/drm/i915/i915_perf.c | 412 ---
 3 files changed, 598 insertions(+), 56 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 117d2ab..3843ef6 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -356,4 +356,95 @@ switch_mm
 .. kernel-doc:: drivers/gpu/drm/i915/i915_trace.h
:doc: switch_mm tracepoint
 
+Perf
+
+
+Overview
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :doc: i915 Perf Overview
+
+Comparison with Core Perf
+-
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :doc: i915 Perf History and Comparison with Core Perf
+
+i915 Driver Entry Points
+
+
+This section covers the entrypoints exported outside of i915_perf.c to
+integrate with drm/i915 and to handle the `DRM_I915_PERF_OPEN` ioctl.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_init
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_fini
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_register
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_unregister
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_open_ioctl
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_release
+
+i915 Perf Stream
+
+
+This section covers the stream-semantics-agnostic structures and functions
+for representing an i915 perf stream FD and associated file operations.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
+   :functions: i915_perf_stream
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
+   :functions: i915_perf_stream_ops
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: read_properties_unlocked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_open_ioctl_locked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_destroy_locked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_read
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_ioctl
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_enable_locked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_disable_locked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_poll
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_poll_locked
+
+i915 Perf Observation Architecture Stream
+-
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
+   :functions: i915_oa_ops
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_stream_init
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_read
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_stream_enable
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_stream_disable
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_wait_unlocked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_poll_wait
+
+All i915 Perf Internals
+---
+
+This section simply includes all currently documented i915 perf internals, in
+no particular order, but may include some more minor utilities or platform
+specific details than found in the more high-level sections.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :internal:
+
 .. WARNING: DOCPROC directive not supported: !Cdrivers/gpu/drm/i915/i915_irq.c
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 33758ac..49c7651 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1847,89 +1847,186 @@ struct i915_oa_reg {
 
 struct i915_perf_stream;
 
+/**
+ * struct i915_perf_stream_ops - the OPs to support a specific stream type
+ */
 struct i915_perf_stream_ops {
-   /* Enables the collection of HW samples, either in response to
-* I915_PERF_IOCTL_ENABLE or implicitly called when stream is
-* opened without I915_PERF_FLAG_DISABLED.
+   /**
+* @enable: Enables the collection of HW samples, either in response to
+* `I915_PERF_IOCTL_ENABLE` or implicitly called when stream is opened

Re: [Intel-gfx] [RFC] drm: Enable dynamic debug for DRM_[DEV]_DEBUG*

2016-12-07 Thread Robert Bragg
On Mon, Dec 5, 2016 at 4:31 PM, Daniel Vetter <dan...@ffwll.ch> wrote:

> On Mon, Dec 05, 2016 at 11:24:44AM +0000, Robert Bragg wrote:
> > Forgot to send to dri-devel when I first sent this out...
> >
> > The few times I've looked at using DRM_DEBUG messages, I haven't found
> > them very helpful considering how noisy some of the categories are. More
> > than once now I've preferred to go in and modify individual files to
> > affect what messages I see and re-build.
> >
> > After recently converting some of the i915_perf.c messages to use
> > DRM_DEBUG, I thought I'd see if DRM_DEBUG could be updated to have a bit
> > more fine grained control than the current category flags.
> >
> > A few things to note with this first iteration:
> >
> > - I haven't looked to see what affect the change has on linked object
> >   sizes.
> >
> > - It seems like it could be nice if dynamic debug could effectively make
> >   the drm_debug parameter redundant but dynamic debug doesn't give us a
> >   way to categorise messages so maybe we'd want to consider including
> >   categories in messages something like:
> >
> >   "[drm][kms] No FB found"
> >
> >   This way all kms messages could be enabled via:
> >   echo "format [kms] +p" > dynamic_debug/control
> >
> >   Note with this simple scheme categories would no longer be mutually
> >   exclusive which could be a nice bonus.
>
> Really nice idea, and I agree that unifying drm.debug with dynamic debug
> in some way would be useful. We could implement your idea by reworking the
> existing debug helpers to auto-prepend the right string. That also opens
> up the door for much more fine-grained bucketing maybe, only challenge is
> that we should document things somewhere.
>

yup, I don't mind writing some doc updates for this if it looks worthwhile.


>
> >   Since it would involve changing the output format, I wonder how
> >   concerned others might be about breaking some userspace (maybe CI test
> >   runners) that for some reason grep for specific messages?
>
> I think the only thing we have to keep working (somehow) is drm.debug. The
> exact output format doesn't really matter at all. Getting drm.debug to
> work when dynamic debugging is enabled probably requires exporting some
> functions, so that we can set the right ddebug options from the drm.debug
> mod-option write handler. There's special mod-option macros that allow you
> to specify write handlers, so that part is ok.
>

dynamic_debug.h exposes a macro for declaring your own dynamic debug meta
data as well as a macro for testing whether the message has been enabled.

I'm handling compatibility by using those macros so I can still test the
drm.drm_debug flags.

Handling compatibility in terms of running control queries from the kernel
would be a bit more tricky since we'd need to export some api from
dynamic_debug.c as well as adding a write handler for drm_debug. Also the
enabledness of messages is boolean not refcounted so I suppose there could
be slightly annoying interactions if mixing both - though that could be
documented.

The only disadvantage I can think of currently for not handling
compatibility in terms of running control queries is that the dynamic debug
macros can normally avoid evaluating any conditions on the cpu while a
message is disabled, based on jump labels/static branches. We were already
evaluating a condition for disabled drm debug messages though, so it seems
reasonable to continue for now.


>
> The other bit of backwards compat we imo need is that by default we should
> still keep drm.debug working, even when dynamic debugging is disabled.
> Having a third option that uses no_printk or similar (to get rid of all
> the debug strings and dead-code-eliminate all the related output code)
>

Yeah, I think the current code already handles this, but sorry if it's not
clear.

This version is #ifdefed so that if dynamic debug isn't enabled the dynamic
debug path reduces to a no_prink

I'm considering CONFIG_DYNAMIC_DEBUG being enabled or not and when enabled
I check drm_debug and the dynamic debug state, when disabled I'm just
checking the drm_debug flags and the dynamic debugs bits boil out.

In my updated patch things we re-jigged a little so pr_debug and dev_dbg
are used when CONFIG_DYNAMIC_DEBUG is not enabled, and these internally
boil down to no_printk if DEBUG is disabled. Actually we might want to
consider if that's the desired behaviour - since DRM_DEBUG wasn't
previously affected by DEBUG being defined or not.


> > --- >8 --- (git am --scissors)
> >
> > Dynamic debug messages (ref: Documentation/dynamic-debug-howto.txt)
> > allow fine grained control over which debug messages are ena

[Intel-gfx] [RFC v2] drm: Enable dynamic debug for DRM_[DEV]_DEBUG*

2016-12-07 Thread Robert Bragg
This is still missing corresponding documentation changes, and I haven't
moved anything to drm_print.h yet, as suggested.

Sending out with a few functional improvements first to get agreement
before documenting anything (changes summarised in v2: section below)

In particular, affecting the output format, I stole an idea from Tvrtko
Ursulin to have the prefix for messages be based on the driver name,
such as "[i915]" instead of always being "[drm]".

Depending on peoples thoughts on compatibility, we could consider
removing the prefix given that the dynamic debug control interface has a
way of specifying that messages should include a module name, function
or line info like:

echo "module i915 +mfp" > dynamic_debug/control

That would enable all i915 debug messages with a module and function
prefix.

A trade-off would be that anyone only using the drm.drm_debug interface
to control messages would loose some information. If we really wanted we
could have the best of both by adding a utility printing api that can
recognise when printing due to a dynamic debug control query vs
drm.drm_debug to conditionally add the prefix.

--- >8 --- (git am --scissors)

Dynamic debug messages (ref: Documentation/dynamic-debug-howto.txt)
allow fine grained control over which debug messages are enabled with
runtime control through /sysfs/kernel/debug/dynamic_debug/control

This provides more control than the current drm.drm_debug parameter
which for some use cases is impractical to use given how chatty
some drm debug categories are.

For example all debug messages in i915_drm.c can be enabled with:
echo "file i915_perf.c +p" > dynamic_debug/control

This doesn't strictly maintain format compatibility with the previous
debug messages since the category is now added as part of the prefix
like "[drm][kms] No FB found". Adding the categories with a consistent
format makes it possible to enable categories with a dynamic debug
query like: echo "format [kms] +p" > dynamic_debug/control

This maintains support for enabling debug messages using the drm_debug
parameter. If dynamic debug is not enabled via CONFIG_DYNAMIC_DEBUG the
debug messages essentially work as before, except with the inclusion of
categories in the format strings as described above.

This removes the drm_[dev_]printk wrappers considering that the dynamic
debug macros are only useful if they can track the __FILE__, __func__
and __LINE__ where they are called. The wrapper didn't seem necessary in
the DRM_UT_NONE case with no category flag.

The non _DEV macros are no longer defined in terms of passing NULL to a
_DEV variant to avoid have the core.c dev_printk implementation adding
"(NULL device *)". The previous drm_[dev_]prink function used to handle
this as a special case.

Instead of using DRM_NAME to add [drm] to the start of every message,
the prefix is now based on module_name(THIS_MODULE) so it will be [drm]
or e.g. [i915] for the Intel driver. Later we might consider removing
the prefix altogether considering that the dynamic debug control
interface has a way of optionally adding the module, function or line to
the formatting of messages.

v2:
Add categories to format like "[drm][kms] No FB found"
Only single conditional call per message (macros expand to less code)
Uses __dynamic_pr_debug/dev_dbg for dynamic formatting features
Use module name for msg prefix like [drm] or [i915]

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Cc: dri-de...@lists.freedesktop.org
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
---
 drivers/gpu/drm/drm_drv.c |  47 ---
 include/drm/drmP.h| 202 +-
 2 files changed, 127 insertions(+), 122 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index f74b7d0..25d00aa 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -65,53 +65,6 @@ static struct idr drm_minors_idr;
 
 static struct dentry *drm_debugfs_root;
 
-#define DRM_PRINTK_FMT "[" DRM_NAME ":%s]%s %pV"
-
-void drm_dev_printk(const struct device *dev, const char *level,
-   unsigned int category, const char *function_name,
-   const char *prefix, const char *format, ...)
-{
-   struct va_format vaf;
-   va_list args;
-
-   if (category != DRM_UT_NONE && !(drm_debug & category))
-   return;
-
-   va_start(args, format);
-   vaf.fmt = format;
-   vaf.va = 
-
-   if (dev)
-   dev_printk(level, dev, DRM_PRINTK_FMT, function_name, prefix,
-  );
-   else
-   printk("%s" DRM_PRINTK_FMT, level, function_name, prefix, );
-
-   va_end(args);
-}
-EXPORT_SYMBOL(drm_dev_printk);
-
-void drm_printk(const char *level, unsigned int category,
-  

Re: [Intel-gfx] [RFC 0/5] DRM logging tidy

2016-12-07 Thread Robert Bragg
On Tue, Dec 6, 2016 at 6:57 PM, Tvrtko Ursulin  wrote:

> From: Tvrtko Ursulin 
>
> I wasn't here at the beginnings of DRM so I might have gotten this wrong,
> however the existance of DRM_NAME suggested to me that the intention was to
> allow individual drivers to override it and get appropriate prefixes in
> their
> log messages.
>
> I can't see that any driver is using it like that but I still thought it
> would
> be neat to do that. That way we could have our log messages look more
> obviously ours. For example after this series we have:
>
>  [i915] Memory usable by graphics device = 4096M
>  [i915] VT-d active for gfx access
>  [i915] Replacing VGA console driver
>  [i915] ACPI BIOS requests an excessive sleep of 2 ms, using 1500 ms
> instead
>  [i915] Finished loading DMC firmware i915/skl_dmc_ver1_26.bin (v1.26)
>  [i915] Disabling framebuffer compression (FBC) to prevent screen flicker
> with VT-d enabled
>  [i915] GuC firmware load skipped
>  [i915] Initialized i915 1.6.0 20161205 for :00:02.0 on minor 0
>  [i915] DRM_I915_DEBUG enabled
>  [i915] DRM_I915_DEBUG_GEM enabled
>  [i915] RC6 on
>
> Previously all that was prefixed with "[drm]" which was OK but I think the
> above is even better.
>
> Also to consider is that recent drm_printk work has removed (it hardcoded)
> DRM_NAME from DRM_ERROR and DRM_DEBUG macros, while leaving it with the
> rest
> (DRM_INFO, NOTE and WARNING) creating a bit of a inconsistency.
>

I wonder if I can maybe fold some of this idea into my related DRM_DEBUG
[RFC] sent out recently:
https://lists.freedesktop.org/archives/dri-devel/2016-December/126094.html

Instead of using DRM_NAME, I've experimented with updating my changes
adding support for dynamic debug to add a prefix based on
module_name(THIS_MODULE) for a similar result

One thing to consider here is that with the addition of dynamic debug
support this prefix arguably becomes redundant because the
dynamic_debug/control interface lets you choose to add a module name or
function prefix to messages, e.g. like:

echo "module i915 +mfp" > dynamic_debug/control

I've ignored the redundancy because my change still allows enabling
messages with the drm.drm_debug parameter and in that case the prefix is
still useful.

Br,
- Robert



> This series also makes all the logging macros use drm_printk, but also
> makes DRM_NAME passed in from the macro wrappers in all cases. So drivers
> can override it regardless of the log level.
>
> And finally, the series also removes a bit of redundant data from the debug
> messages effectively converting this:
>
>  [drm:edp_panel_off [i915]] Wait for panel power off time
>
> Into this:
>
>  [edp_panel_off [i915]] Wait for panel power off time
>
> Which still has all the data in it.
>
> Tvrtko Ursulin (5):
>   drm/i915: Give our log messages our name
>   drm: Respect driver set DRM_NAME in drm_printk
>   drm: Respect driver set DRM_NAME in drm_dev_printk
>   drm: Use drm_printk for all logging macros
>   drm: Do not log driver prefix in debug messages
>
>  drivers/gpu/drm/drm_drv.c   | 39 +++--
>  drivers/gpu/drm/i915/i915_drv.c |  3 +-
>  include/drm/drmP.h  | 94 --
> ---
>  include/drm/drm_drv.h   | 11 ++---
>  include/uapi/drm/i915_drm.h |  3 ++
>  5 files changed, 92 insertions(+), 58 deletions(-)
>
> --
> 2.7.4
>
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [RFC] drm: Enable dynamic debug for DRM_[DEV]_DEBUG*

2016-12-05 Thread Robert Bragg
Forgot to send to dri-devel when I first sent this out...

The few times I've looked at using DRM_DEBUG messages, I haven't found
them very helpful considering how noisy some of the categories are. More
than once now I've preferred to go in and modify individual files to
affect what messages I see and re-build.

After recently converting some of the i915_perf.c messages to use
DRM_DEBUG, I thought I'd see if DRM_DEBUG could be updated to have a bit
more fine grained control than the current category flags.

A few things to note with this first iteration:

- I haven't looked to see what affect the change has on linked object
  sizes.

- It seems like it could be nice if dynamic debug could effectively make
  the drm_debug parameter redundant but dynamic debug doesn't give us a
  way to categorise messages so maybe we'd want to consider including
  categories in messages something like:

  "[drm][kms] No FB found"

  This way all kms messages could be enabled via:
  echo "format [kms] +p" > dynamic_debug/control

  Note with this simple scheme categories would no longer be mutually
  exclusive which could be a nice bonus.

  Since it would involve changing the output format, I wonder how
  concerned others might be about breaking some userspace (maybe CI test
  runners) that for some reason grep for specific messages?

--- >8 --- (git am --scissors)

Dynamic debug messages (ref: Documentation/dynamic-debug-howto.txt)
allow fine grained control over which debug messages are enabled with
runtime control through /sysfs/kernel/debug/dynamic_debug/control

This provides more control than the current drm.drm_debug parameter
which for some use cases is impractical to use given how chatty
some drm debug categories are.

For example all debug messages in i915_drm.c can be enabled with:
echo "file i915_perf.c +p" > dynamic_debug/control

This aims to maintain compatibility with controlling debug messages
using the drm_debug parameter. The new dynamic debug macros are called
by default but conditionally calling [dev_]printk if the category flag
is set (side stepping the dynamic debug condition in that case)

This removes the drm_[dev_]printk wrappers considering that the dynamic
debug macros are only useful if they can track the __FILE__, __func__
and __LINE__ where they are called. The wrapper didn't seem necessary in
the DRM_UT_NONE case with no category flag.

The output format should be compatible, unless the _DEV macros are
passed a NULL dev pointer considering how the core.c dev_printk
implementation adds "(NULL device *)" to the message in that case while
the drm wrapper would fallback to a plain printk in this case.
Previously some of non-dev drm debug macros were defined in terms of
passing NULL to a dev version but that's avoided now due to this
difference.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Cc: dri-de...@lists.freedesktop.org
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
Cc: Chris Wilson <ch...@chris-wilson.co.uk>
---
 drivers/gpu/drm/drm_drv.c |  47 -
 include/drm/drmP.h| 168 +-
 2 files changed, 108 insertions(+), 107 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index f74b7d0..25d00aa 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -65,53 +65,6 @@ static struct idr drm_minors_idr;
 
 static struct dentry *drm_debugfs_root;
 
-#define DRM_PRINTK_FMT "[" DRM_NAME ":%s]%s %pV"
-
-void drm_dev_printk(const struct device *dev, const char *level,
-   unsigned int category, const char *function_name,
-   const char *prefix, const char *format, ...)
-{
-   struct va_format vaf;
-   va_list args;
-
-   if (category != DRM_UT_NONE && !(drm_debug & category))
-   return;
-
-   va_start(args, format);
-   vaf.fmt = format;
-   vaf.va = 
-
-   if (dev)
-   dev_printk(level, dev, DRM_PRINTK_FMT, function_name, prefix,
-  );
-   else
-   printk("%s" DRM_PRINTK_FMT, level, function_name, prefix, );
-
-   va_end(args);
-}
-EXPORT_SYMBOL(drm_dev_printk);
-
-void drm_printk(const char *level, unsigned int category,
-   const char *format, ...)
-{
-   struct va_format vaf;
-   va_list args;
-
-   if (category != DRM_UT_NONE && !(drm_debug & category))
-   return;
-
-   va_start(args, format);
-   vaf.fmt = format;
-   vaf.va = 
-
-   printk("%s" "[" DRM_NAME ":%ps]%s %pV",
-  level, __builtin_return_address(0),
-  strcmp(level, KERN_ERR) == 0 ? " *ERROR*" : "", );
-
-   va_end(args);
-}
-EXPORT_SYMBOL(drm_printk);
-
 /*
  * DRM Minors
  * A DRM device can provide several char-dev interfaces on the DRM-Major. Each

[Intel-gfx] [PATCH] drm/i915/perf: More documentation hooked to i915.rst

2016-12-05 Thread Robert Bragg
This adds a 'Perf' section to i915.rst with the following sub sections:
- Overview
- Comparison with Core Perf
- i915 Driver Entry Points
- i915 Perf Stream
- i915 Perf Observation Architecture Stream
- All i915 Perf Internals

v2:
section headers in i915.rst (Daniel Vetter)
missing symbol docs + other fixups (Matthew Auld)

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
Cc: Matthew Auld <matthew.a...@intel.com>
---
 Documentation/gpu/i915.rst   |  91 +
 drivers/gpu/drm/i915/i915_drv.h  | 151 +++---
 drivers/gpu/drm/i915/i915_perf.c | 412 ---
 3 files changed, 598 insertions(+), 56 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 117d2ab..847a094 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -356,4 +356,95 @@ switch_mm
 .. kernel-doc:: drivers/gpu/drm/i915/i915_trace.h
:doc: switch_mm tracepoint
 
+Perf
+
+
+Overview
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :doc: i915 Perf Overview
+
+Comparison with Core Perf
+-
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :doc: i915 Perf History and Comparison with Core Perf
+
+i915 Driver Entry Points
+
+
+This section covers the entrypoints exported outside of i915_perf.c to
+integrate with drm/i915 and to handle the `DRM_I915_PERF_OPEN` ioctl.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_init
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_fini
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_register
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_unregister
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_open_ioctl
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_release
+
+i915 Perf Stream
+
+
+This section covers the stream-semantics-agnostic structures and functions
+for representing an i915 perf stream FD and associated file operations.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
+   :functions: i915_perf_stream
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
+   :functions: i915_perf_stream_ops
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: read_properties_unlocked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_open_ioctl_locked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_destroy_locked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_read
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_ioctl
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_enable_unlocked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_disable_unlocked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_poll
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_poll_locked
+
+i915 Perf Observation Architecture Stream
+-
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
+   :functions: i915_oa_ops
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_stream_init
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_read
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_stream_enable
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_stream_disable
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_wait_unlocked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_poll_wait
+
+All i915 Perf Internals
+---
+
+This section simply includes all currently documented i915 perf internals, in
+no particular order, but may include some more minor utilities or platform
+specific details than found in the more high-level sections.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :internal:
+
 .. WARNING: DOCPROC directive not supported: !Cdrivers/gpu/drm/i915/i915_irq.c
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ca9786c..1ddebc7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1830,89 +1830,186 @@ struct i915_oa_reg {
 
 struct i915_perf_stream;
 
+/**
+ * struct i915_perf_stream_ops - the OPs to support a specific stream type
+ */
 struct i915_perf_stream_ops {
-   /* Enables the collection of HW samples, either in response to
-* I915_PERF_IOCTL_ENABLE or implicitly called when stream is
-* opened without I915_PERF_FLAG_DISABLED.
+   /**
+* @enable: Enables the collection of HW samples, either in response to
+* `I915_PERF_IOCTL_ENABLE` or implicitly called when stream is opened

Re: [Intel-gfx] [PATCH] drm/i915/perf: More documentation hooked to i915.rst

2016-12-02 Thread Robert Bragg
On Thu, Dec 1, 2016 at 12:12 PM, Jani Nikula <jani.nik...@linux.intel.com>
wrote:

> On Wed, 30 Nov 2016, Daniel Vetter <dan...@ffwll.ch> wrote:
> > On Tue, Nov 29, 2016 at 05:00:55PM +, Robert Bragg wrote:
> >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> >> +   :functions: i915_perf_init
> >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> >> +   :functions: i915_perf_fini
> >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> >> +   :functions: i915_perf_register
> >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> >> +   :functions: i915_perf_unregister
> >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> >> +   :functions: i915_perf_open_ioctl
> >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> >> +   :functions: i915_perf_release
> >
> > One potential issue with listing everything explicitly is that if someone
> > ever (and this is bound to happen) adds a new function, they'll forget to
> > add it. Hence we just pull them all in, and if you want to refernce some
> > specifically, do that in the overview sections.
>
> One real issue with listing everything separately is that kernel-doc
> parses the source file once per every kernel-doc directive.
>

Yeah, this is unfortunate and I'd originally hoped I could pass an ordered
list which could reduce how often kernel-doc is run. In practice I haven't
seen a performance problem with doing this though.


>
> Also, doesn't Sphinx complain about not having a blank line to end the
> indented block after the directive? It might not, but I thought it
> might.
>

Apparently it's ok, I've been generating and previewing the documentation
and haven't seen a warning about this.

>From the restructure text spec, regarding white space:
"Blank lines may be omitted when the markup makes element separation
unambiguous, in conjunction with indentation."

Regards,
- Robert


>
> BR,
> Jani.
>
>
> --
> Jani Nikula, Intel Open Source Technology Center
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/perf: More documentation hooked to i915.rst

2016-12-02 Thread Robert Bragg
On Nov 30, 2016 19:41, "Daniel Vetter" <dan...@ffwll.ch> wrote:
>
> On Tue, Nov 29, 2016 at 05:00:55PM +, Robert Bragg wrote:
> > This adds a 'Perf' section to i915.rst with the following sub sections:
> > - Overview
> > - Comparison with Core Perf
> > - i915 Driver Entry Points
> > - i915 Perf Stream
> > - i915 Perf Observation Architecture Stream
> > - All i915 Perf Internals
> >
> > Signed-off-by: Robert Bragg <rob...@sixbynine.org>
> > Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
>
> Two style bikesheds below, feel free to ignore.
>
> > ---
> >  Documentation/gpu/i915.rst   |  92 +
> >  drivers/gpu/drm/i915/i915_drv.h  | 151 
> >  drivers/gpu/drm/i915/i915_perf.c | 289 ++
+
> >  3 files changed, 478 insertions(+), 54 deletions(-)
> >
> > diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> > index 117d2ab..714bd4b 100644
> > --- a/Documentation/gpu/i915.rst
> > +++ b/Documentation/gpu/i915.rst
> > @@ -356,4 +356,96 @@ switch_mm
> >  .. kernel-doc:: drivers/gpu/drm/i915/i915_trace.h
> > :doc: switch_mm tracepoint
> >
> > +Perf
> > +
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :doc: i915 Perf Overview
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :doc: i915 Perf History and Comparison with Core Perf
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :doc: i915 Perf File Operations
>
> You have the headings in the DOC comments itself, which works until
> someone reorganizes stuff. Then it tends to fall apart badly.

Yeah, could be better.

>
> > +
> > +i915 Driver Entry Points
> > +
> > +
> > +This section covers the entrypoints exported outside of i915_perf.c to
> > +integrate with drm/i915 and to handle the `DRM_I915_PERF_OPEN` ioctl.
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_init
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_fini
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_register
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_unregister
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_open_ioctl
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
> > +   :functions: i915_perf_release
>
> One potential issue with listing everything explicitly is that if someone
> ever (and this is bound to happen) adds a new function, they'll forget to
> add it. Hence we just pull them all in, and if you want to refernce some
> specifically, do that in the overview sections. And also sprinkle lots of
> cross-references all over to make groups of functions easier to discover.
Without any structure it just didn't seem like documentation; just a dump
of internals info which didn't look that helpful to me.

There are some fairly noteworthy separations of responsibilities between
functions providing the i915 perf stream infrastructure and then the code
for OA unit streams, and then code specifically for Haswell. Splitting them
into sections seems worthwhile to me.

I don't think it's that big a deal listing symbols individually and
maintaining this when adding new symbols. Having maintained Cogl using
gtk-doc where symbols had to be explicitly listed in a -sections.txt file
to added them to the documentation, that wasn't too bad. It's true that
slip ups happen, but having control over the order symbols are presented
seems good to me. It could be nice if an ordered list could be passed to
:functions: to reduce how many times the corresponding perl script runs.

It could maybe be good if there was some way to tag/label symbols in
kerneldoc for selection in restructured text. It would also be nice if the
tooling understood what symbols were in the document so far, so it could
somehow be possible to have a dumping ground section for 'everything else'
at the end.

Regards,
- Robert

>
> But in the end your docs, your turf ;-)
> -Daniel
>
> > +
> > +i915 Perf Stream
> > +
> > +
> > +This section covers the stream-semantics-agnostic structures and
functions
> > +for representing an i915 perf stream FD and associated file operations.
> > +
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
> > +   :functions: i915_perf_stream
> > +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
> > +   :functions: i915_perf_stream_ops
> > +
> > +.. kernel-doc:: drivers

Re: [Intel-gfx] [PATCH] drm/i915/perf: use DRM_DEBUG for userspace issues

2016-12-02 Thread Robert Bragg
On Fri, Dec 2, 2016 at 8:35 AM, Daniel Vetter <dan...@ffwll.ch> wrote:

> On Thu, Dec 01, 2016 at 05:21:52PM +0000, Robert Bragg wrote:
> > Avoid using DRM_ERROR for conditions userspace can trigger with a bad
> > config when opening a stream or from not reading data in a timely
> > fashion (whereby the OA buffer fills up). These conditions are tested
> > by i-g-t which treats error messages as failures if using the test
> > runner. This wasn't an issue while the i915-perf igt tests were being
> > run in isolation.
> >
> > One message relating to seeing a spurious zeroed report was changed to
> > use DRM_NOTE instead of DRM_ERROR. Ideally this warning shouldn't be
> > seen, but it's not a serious problem if it is. Considering that the
> > tail margin mechanism is only a heuristic it's possible we might see
> > this from time to time.
> >
> > Signed-off-by: Robert Bragg <rob...@sixbynine.org:
> > Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
> >
> > fix i915_perf dbg messages
> > ---
> >  drivers/gpu/drm/i915/i915_perf.c | 42 --
> --
> >  1 file changed, 21 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c
> b/drivers/gpu/drm/i915/i915_perf.c
> > index 9551282..5705005 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -474,7 +474,7 @@ static int gen7_append_oa_reports(struct
> i915_perf_stream *stream,
> >* copying it to userspace...
> >*/
> >   if (report32[0] == 0) {
> > - DRM_ERROR("Skipping spurious, invalid OA
> report\n");
> > + DRM_NOTE("Skipping spurious, invalid OA report\n");
> >   continue;
>
> The above looks like a genuine hw/kernel fail, which we shouldn't put
> under the carpet. I'd leave it at DRM_ERROR - I can bikeshed that while
> applying if you're ok. Otherwise lgtm, will apply as soon as we've
> clarified that.
>

It's something that is unfortunately expected to be possible from time to
time due to a hardware race condition between the OA unit updating the tail
pointer for a new report and that report actually becoming visible to the
cpu in memory.

If/when it happens it's not really a significant problem for userspace
(assuming it's rare/intermittent given what the driver does as a
best-effort workaround here). Userspace sees a briefly lower sampling
resolution but the metrics can still be normalized.

We wouldn't want i-g-t failing in this case, so that's why I was changing
it.

It's not really something you want to see ideally (it implies our
heuristic-based software workaround isn't perfect). If it's seen a lot then
that certainly should be considered a warning that we need to try and
improve how we workaround the race condition. If you see it rarely then is
somewhere between a note, and a warning I suppose.

Regards,
- Robert


> -Daniel
>
> >   }
> >
> > @@ -551,7 +551,7 @@ static int gen7_oa_read(struct i915_perf_stream
> *stream,
> >   if (ret)
> >   return ret;
> >
> > - DRM_ERROR("OA buffer overflow: force restart\n");
> > + DRM_DEBUG("OA buffer overflow: force restart\n");
> >
> >   dev_priv->perf.oa.ops.oa_disable(dev_priv);
> >   dev_priv->perf.oa.ops.oa_enable(dev_priv);
> > @@ -1000,17 +1000,17 @@ static int i915_oa_stream_init(struct
> i915_perf_stream *stream,
> >* IDs
> >*/
> >   if (!dev_priv->perf.metrics_kobj) {
> > - DRM_ERROR("OA metrics weren't advertised via sysfs\n");
> > + DRM_DEBUG("OA metrics weren't advertised via sysfs\n");
> >   return -EINVAL;
> >   }
> >
> >   if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
> > - DRM_ERROR("Only OA report sampling supported\n");
> > + DRM_DEBUG("Only OA report sampling supported\n");
> >   return -EINVAL;
> >   }
> >
> >   if (!dev_priv->perf.oa.ops.init_oa_buffer) {
> > - DRM_ERROR("OA unit not supported\n");
> > + DRM_DEBUG("OA unit not supported\n");
> >   return -ENODEV;
> >   }
> >
> > @@ -1019,17 +1019,17 @@ static int i915_oa_stream_init(struct
> i915_perf_stream *stream,
> >* we currently only allow exclusive access
> >*/
> >   if (dev_priv->perf.o

[Intel-gfx] [PATCH] drm/i915/perf: use DRM_DEBUG for userspace issues

2016-12-01 Thread Robert Bragg
Avoid using DRM_ERROR for conditions userspace can trigger with a bad
config when opening a stream or from not reading data in a timely
fashion (whereby the OA buffer fills up). These conditions are tested
by i-g-t which treats error messages as failures if using the test
runner. This wasn't an issue while the i915-perf igt tests were being
run in isolation.

One message relating to seeing a spurious zeroed report was changed to
use DRM_NOTE instead of DRM_ERROR. Ideally this warning shouldn't be
seen, but it's not a serious problem if it is. Considering that the
tail margin mechanism is only a heuristic it's possible we might see
this from time to time.

Signed-off-by: Robert Bragg <rob...@sixbynine.org:
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>

fix i915_perf dbg messages
---
 drivers/gpu/drm/i915/i915_perf.c | 42 
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 9551282..5705005 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -474,7 +474,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
 * copying it to userspace...
 */
if (report32[0] == 0) {
-   DRM_ERROR("Skipping spurious, invalid OA report\n");
+   DRM_NOTE("Skipping spurious, invalid OA report\n");
continue;
}
 
@@ -551,7 +551,7 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
if (ret)
return ret;
 
-   DRM_ERROR("OA buffer overflow: force restart\n");
+   DRM_DEBUG("OA buffer overflow: force restart\n");
 
dev_priv->perf.oa.ops.oa_disable(dev_priv);
dev_priv->perf.oa.ops.oa_enable(dev_priv);
@@ -1000,17 +1000,17 @@ static int i915_oa_stream_init(struct i915_perf_stream 
*stream,
 * IDs
 */
if (!dev_priv->perf.metrics_kobj) {
-   DRM_ERROR("OA metrics weren't advertised via sysfs\n");
+   DRM_DEBUG("OA metrics weren't advertised via sysfs\n");
return -EINVAL;
}
 
if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
-   DRM_ERROR("Only OA report sampling supported\n");
+   DRM_DEBUG("Only OA report sampling supported\n");
return -EINVAL;
}
 
if (!dev_priv->perf.oa.ops.init_oa_buffer) {
-   DRM_ERROR("OA unit not supported\n");
+   DRM_DEBUG("OA unit not supported\n");
return -ENODEV;
}
 
@@ -1019,17 +1019,17 @@ static int i915_oa_stream_init(struct i915_perf_stream 
*stream,
 * we currently only allow exclusive access
 */
if (dev_priv->perf.oa.exclusive_stream) {
-   DRM_ERROR("OA unit already in use\n");
+   DRM_DEBUG("OA unit already in use\n");
return -EBUSY;
}
 
if (!props->metrics_set) {
-   DRM_ERROR("OA metric set not specified\n");
+   DRM_DEBUG("OA metric set not specified\n");
return -EINVAL;
}
 
if (!props->oa_format) {
-   DRM_ERROR("OA report format not specified\n");
+   DRM_DEBUG("OA report format not specified\n");
return -EINVAL;
}
 
@@ -1384,7 +1384,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private 
*dev_priv,
if (IS_ERR(specific_ctx)) {
ret = PTR_ERR(specific_ctx);
if (ret != -EINTR)
-   DRM_ERROR("Failed to look up context with ID %u 
for opening perf stream\n",
+   DRM_DEBUG("Failed to look up context with ID %u 
for opening perf stream\n",
  ctx_handle);
goto err;
}
@@ -1397,7 +1397,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private 
*dev_priv,
 */
if (!specific_ctx &&
i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
-   DRM_ERROR("Insufficient privileges to open system-wide i915 
perf stream\n");
+   DRM_DEBUG("Insufficient privileges to open system-wide i915 
perf stream\n");
ret = -EACCES;
goto err_ctx;
}
@@ -1476,7 +1476,7 @@ static int read_properties_unlocked(struct 
drm_i915_private *dev_priv,
memset(props, 0, sizeof(struct perf_open_properties));
 
if (!n_props) {
-   DRM_ERROR("No i915 perf properties given");
+   DRM_D

[Intel-gfx] [RFC] drm: Enable dynamic debug for DRM_[DEV]_DEBUG*

2016-12-01 Thread Robert Bragg
I'm currently considering the use of DRM_ERROR in i915 perf for steam
config validation errors (i.e. userspace misconfigurations) that should
be changed so that i-g-t tests aren't treated as failures when
triggering these.

I initially proposed changing these to DRM_INFO messages and
intentionally wanted to avoid DRM_DEBUG since in my limited experience
DRM_DEBUG messages aren't practical to work with.

I thought I'd see if DRM_DEBUG could be updated to have a bit more fine
grained control in case that might help sway my view.

Tbh, although I think something like this could be nice to have, I'm
still not really convinced that debug messages are a great fit for
helping userspace developers hitting EINVAL errors. Such developers
don't need to be drm/i915 developers and imho shouldn't be expected to
know of the existence of optional debug messages, and if you don't know
of there existence then the control interface isn't important and they
won't help anyone.

--- >8 --- (git am --scissors)

Dynamic debug messages (ref: Documentation/dynamic-debug-howto.txt)
allow fine grained control over which debug messages are enabled with
runtime control through /sysfs/kernel/debug/dynamic_debug/control

This provides more control than the current drm.drm_debug parameter
which for some use cases is impractical to use given how chatty
some drm debug categories are.

For example all debug messages in i915_drm.c can be enabled with:
echo "file i915_perf.c +p" > dynamic_debug/control

This aims to maintain compatibility with controlling debug messages
using the drm_debug parameter. The new dynamic debug macros are called
by default but conditionally calling [dev_]printk if the category flag
is set (side stepping the dynamic debug condition in that case)

This removes the drm_[dev_]printk wrappers considering that the dynamic
debug macros are only useful if they can track the __FILE__, __func__
and __LINE__ where they are called. The wrapper didn't seem necessary in
the DRM_UT_NONE case with no category flag.

The output format should be compatible, unless the _DEV macros are
passed a NULL dev pointer considering how the core.c dev_printk
implementation adds "(NULL device *)" to the message in that case while
the drm wrapper would fallback to a plain printk in this case.
Previously some of non-dev drm debug macros were defined in terms of
passing NULL to a dev version but that's avoided now due to this
difference.

I haven't so far looked to see what affect these have on linked object
sizes.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Cc: Chris Wilson <ch...@chris-wilson.co.uk>
---
 drivers/gpu/drm/drm_drv.c |  47 -
 include/drm/drmP.h| 168 +-
 2 files changed, 108 insertions(+), 107 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index cc6c253..5b2dbcd 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -65,53 +65,6 @@ static struct idr drm_minors_idr;
 
 static struct dentry *drm_debugfs_root;
 
-#define DRM_PRINTK_FMT "[" DRM_NAME ":%s]%s %pV"
-
-void drm_dev_printk(const struct device *dev, const char *level,
-   unsigned int category, const char *function_name,
-   const char *prefix, const char *format, ...)
-{
-   struct va_format vaf;
-   va_list args;
-
-   if (category != DRM_UT_NONE && !(drm_debug & category))
-   return;
-
-   va_start(args, format);
-   vaf.fmt = format;
-   vaf.va = 
-
-   if (dev)
-   dev_printk(level, dev, DRM_PRINTK_FMT, function_name, prefix,
-  );
-   else
-   printk("%s" DRM_PRINTK_FMT, level, function_name, prefix, );
-
-   va_end(args);
-}
-EXPORT_SYMBOL(drm_dev_printk);
-
-void drm_printk(const char *level, unsigned int category,
-   const char *format, ...)
-{
-   struct va_format vaf;
-   va_list args;
-
-   if (category != DRM_UT_NONE && !(drm_debug & category))
-   return;
-
-   va_start(args, format);
-   vaf.fmt = format;
-   vaf.va = 
-
-   printk("%s" "[" DRM_NAME ":%ps]%s %pV",
-  level, __builtin_return_address(0),
-  strcmp(level, KERN_ERR) == 0 ? " *ERROR*" : "", );
-
-   va_end(args);
-}
-EXPORT_SYMBOL(drm_printk);
-
 /*
  * DRM Minors
  * A DRM device can provide several char-dev interfaces on the DRM-Major. Each
diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index b352a7b..d61d937 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -58,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -129,7 +130,6 @@ struct dma_buf_attachment;
  * run-time by echoing the debug value in its sysfs node:
  *   # echo 0xf > /sys/module/drm/parameters/debug
  */
-#define DRM_

[Intel-gfx] [PATCH] drm/i915/perf: More documentation hooked to i915.rst

2016-11-29 Thread Robert Bragg
This adds a 'Perf' section to i915.rst with the following sub sections:
- Overview
- Comparison with Core Perf
- i915 Driver Entry Points
- i915 Perf Stream
- i915 Perf Observation Architecture Stream
- All i915 Perf Internals

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
---
 Documentation/gpu/i915.rst   |  92 +
 drivers/gpu/drm/i915/i915_drv.h  | 151 
 drivers/gpu/drm/i915/i915_perf.c | 289 +++
 3 files changed, 478 insertions(+), 54 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 117d2ab..714bd4b 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -356,4 +356,96 @@ switch_mm
 .. kernel-doc:: drivers/gpu/drm/i915/i915_trace.h
:doc: switch_mm tracepoint
 
+Perf
+
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :doc: i915 Perf Overview
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :doc: i915 Perf History and Comparison with Core Perf
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :doc: i915 Perf File Operations
+
+i915 Driver Entry Points
+
+
+This section covers the entrypoints exported outside of i915_perf.c to
+integrate with drm/i915 and to handle the `DRM_I915_PERF_OPEN` ioctl.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_init
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_fini
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_register
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_unregister
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_open_ioctl
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_release
+
+i915 Perf Stream
+
+
+This section covers the stream-semantics-agnostic structures and functions
+for representing an i915 perf stream FD and associated file operations.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
+   :functions: i915_perf_stream
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
+   :functions: i915_perf_stream_ops
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: read_properties_unlocked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_open_ioctl_unlocked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_destroy_unlocked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_read
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_ioctl
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_enable_unlocked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_disable_unlocked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_poll
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_perf_poll_unlocked
+
+i915 Perf Observation Architecture Stream
+-
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: OA_BUFFER_SIZE
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
+   :functions: i915_oa_ops
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_stream_init
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_read
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_stream_enable
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_stream_disable
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_wait_unlocked
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :functions: i915_oa_poll_wait
+
+All i915 Perf Internals
+---
+
+This section simply includes all currently documented i915 perf internals, in
+no particular order, but may include some more minor utilities or platform
+specific details than found in the more high-level sections.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
+   :internal:
+
 .. WARNING: DOCPROC directive not supported: !Cdrivers/gpu/drm/i915/i915_irq.c
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1ec9619..9f92755 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1827,89 +1827,186 @@ struct i915_oa_reg {
 
 struct i915_perf_stream;
 
+/**
+ * struct i915_perf_stream_ops - the OPs to support a specific stream type
+ */
 struct i915_perf_stream_ops {
-   /* Enables the collection of HW samples, either in response to
-* I915_PERF_IOCTL_ENABLE or implicitly called when stream is
-* opened without I915_PERF_FLAG_DISABLED.
+   /**
+* @enable: Enables the collection of HW samples, either in response to
+* I915_PERF_IOCTL_ENABLE or implicitly called when stream is opened
+* without I915_PERF_F

[Intel-gfx] [PATCH] drm/i915/perf: use DRM_INFO for userspace issues

2016-11-29 Thread Robert Bragg
Avoid using DRM_ERROR for conditions userspace can trigger with a bad
config when opening a stream or from not reading data in a timely
fashion (whereby the OA buffer fills up). These conditions are tested
by i-g-t which treats error messages as failures if using the test
runner. This wasn't an issue while the i915-perf igt tests were being
run in isolation.

DRM_INFO was used over DRM_DEBUG since it's proven convenient while
working on gputop and mesa to have a message on the console for a
malformed config without needing to explicitly enable drm debug messages
which can be very verbose.

One message relating to seeing a spurious zeroed report was changed to
use DRM_WARN instead of DRM_ERROR. Ideally this warning shouldn't be
seen, but it's not a serious problem if it is. Considering that the
tail margin mechanism is only a heuristic it's possible we might see
this from time to time.

Signed-off-by: Robert Bragg <rob...@sixbynine.org:
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_perf.c | 46 
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 9551282..68b7c27 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -474,7 +474,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
 * copying it to userspace...
 */
if (report32[0] == 0) {
-   DRM_ERROR("Skipping spurious, invalid OA report\n");
+   DRM_WARN("Skipping spurious, invalid OA report\n");
continue;
}
 
@@ -551,7 +551,7 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
if (ret)
return ret;
 
-   DRM_ERROR("OA buffer overflow: force restart\n");
+   DRM_INFO("OA buffer overflow: force restart\n");
 
dev_priv->perf.oa.ops.oa_disable(dev_priv);
dev_priv->perf.oa.ops.oa_enable(dev_priv);
@@ -1000,17 +1000,17 @@ static int i915_oa_stream_init(struct i915_perf_stream 
*stream,
 * IDs
 */
if (!dev_priv->perf.metrics_kobj) {
-   DRM_ERROR("OA metrics weren't advertised via sysfs\n");
+   DRM_INFO("OA metrics weren't advertised via sysfs\n");
return -EINVAL;
}
 
if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
-   DRM_ERROR("Only OA report sampling supported\n");
+   DRM_INFO("Only OA report sampling supported\n");
return -EINVAL;
}
 
if (!dev_priv->perf.oa.ops.init_oa_buffer) {
-   DRM_ERROR("OA unit not supported\n");
+   DRM_INFO("OA unit not supported\n");
return -ENODEV;
}
 
@@ -1019,17 +1019,17 @@ static int i915_oa_stream_init(struct i915_perf_stream 
*stream,
 * we currently only allow exclusive access
 */
if (dev_priv->perf.oa.exclusive_stream) {
-   DRM_ERROR("OA unit already in use\n");
+   DRM_INFO("OA unit already in use\n");
return -EBUSY;
}
 
if (!props->metrics_set) {
-   DRM_ERROR("OA metric set not specified\n");
+   DRM_INFO("OA metric set not specified\n");
return -EINVAL;
}
 
if (!props->oa_format) {
-   DRM_ERROR("OA report format not specified\n");
+   DRM_INFO("OA report format not specified\n");
return -EINVAL;
}
 
@@ -1384,8 +1384,8 @@ i915_perf_open_ioctl_locked(struct drm_i915_private 
*dev_priv,
if (IS_ERR(specific_ctx)) {
ret = PTR_ERR(specific_ctx);
if (ret != -EINTR)
-   DRM_ERROR("Failed to look up context with ID %u 
for opening perf stream\n",
- ctx_handle);
+   DRM_INFO("Failed to look up context with ID %u 
for opening perf stream\n",
+ctx_handle);
goto err;
}
}
@@ -1397,7 +1397,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private 
*dev_priv,
 */
if (!specific_ctx &&
i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
-   DRM_ERROR("Insufficient privileges to open system-wide i915 
perf stream\n");
+   DRM_INFO("Insufficient privileges to open system-wide i915 perf 
stream\n");
ret = -EACCES;
goto err_ctx;
}
@@ -1476,7 +1476,7 @@ static

Re: [Intel-gfx] [PATCH v3] drm/i915/perf: Wrap 64bit divides in do_div()

2016-11-29 Thread Robert Bragg
On Wed, Nov 23, 2016 at 3:07 PM, Chris Wilson <ch...@chris-wilson.co.uk>
wrote:

> Just a couple of naked 64bit divides causing link errors on 32bit
> builds, with:
>
> ERROR: "__udivdi3" [drivers/gpu/drm/i915/i915.ko] undefined!
>
> v2: do_div() is only u64/u32, we need a u32/u64!
> v3: div_u64() == u64/u32, div64_u64() == u64/u64
>
> Reported-by: kbuild test robot <fengguang...@intel.com>
> Fixes: d79651522e89 ("drm/i915: Enable i915 perf stream for Haswell OA
> unit")
> Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> Cc: Robert Bragg <rob...@sixbynine.org>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_
> perf.c
> index 95512824922b..14de9a4eee27 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -974,8 +974,8 @@ static void i915_oa_stream_disable(struct
> i915_perf_stream *stream)
>
>  static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int
> exponent)
>  {
> -   return 10ULL * (2ULL << exponent) /
> -   dev_priv->perf.oa.timestamp_frequency;
> +   return div_u64(10ULL * (2ULL << exponent),
> +  dev_priv->perf.oa.timestamp_frequency);
>  }
>
>  static const struct i915_perf_stream_ops i915_oa_stream_ops = {
> @@ -1051,16 +1051,17 @@ static int i915_oa_stream_init(struct
> i915_perf_stream *stream,
>
> dev_priv->perf.oa.periodic = props->oa_periodic;
> if (dev_priv->perf.oa.periodic) {
> -   u64 period_ns = oa_exponent_to_ns(dev_priv,
> -
>  props->oa_period_exponent);
> +   u32 tail;
>
> dev_priv->perf.oa.period_exponent =
> props->oa_period_exponent;
>
> /* See comment for OA_TAIL_MARGIN_NSEC for details
>  * about this tail_margin...
>  */
> -   dev_priv->perf.oa.tail_margin =
> -   ((OA_TAIL_MARGIN_NSEC / period_ns) + 1) *
> format_size;
> +   tail = div64_u64(OA_TAIL_MARGIN_NSEC,
> +    oa_exponent_to_ns(dev_priv,
> +
> props->oa_period_exponent));
> +   dev_priv->perf.oa.tail_margin = (tail + 1) * format_size;
> }
>
> if (stream->ctx) {
> --
> 2.10.2
>
>
This looks good to me, thanks.

Reviewed-by: Robert Bragg <rob...@sixbynine.org>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH igt] igt/gem_exec_parse: generalise test_lri + debug info

2016-11-24 Thread Robert Bragg
This further generalises the description passed to test_lri so we only
need one loop over the entries with test_lri deducing the exected errno
and value based on whether the register is marked as whitelisted and
depending on the current command parser version.

Each tested register LRI now test gets its own subtest like:
igt_subtest_f("test-lri-%s", reg_name)

The test_lri helper now also double checks that the initial
intel_register_write() takes before issuing the LRI.

In case of a failure the test_lri helper now uses igt_debug to log the
register name, address and value being tested.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
Cc: Chris Wilson <ch...@chris-wilson.co.uk>
---
 tests/gem_exec_parse.c | 102 +
 1 file changed, 52 insertions(+), 50 deletions(-)

diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c
index cc2103a..0cd7053 100644
--- a/tests/gem_exec_parse.c
+++ b/tests/gem_exec_parse.c
@@ -258,12 +258,17 @@ static void exec_batch_chained(int fd, uint32_t cmd_bo, 
uint32_t *cmds,
  * from...
  */
 struct test_lri {
-   uint32_t reg, read_mask, init_val, test_val;
+   const char *name; /* register name for debug info */
+   uint32_t reg; /* address to test */
+   uint32_t read_mask; /* ignore things like HW status bits */
+   uint32_t init_val; /* initial identifiable value to set without LRI */
+   uint32_t test_val; /* value to attempt loading via LRI command */
+   bool whitelisted; /* expect to become NOOP / fail if not whitelisted */
+   int min_ver; /* required command parser version to test */
 };
 
 static void
-test_lri(int fd, uint32_t handle,
-struct test_lri *test, int expected_errno, uint32_t expect)
+test_lri(int fd, uint32_t handle, struct test_lri *test)
 {
uint32_t lri[] = {
MI_LOAD_REGISTER_IMM,
@@ -271,9 +276,20 @@ test_lri(int fd, uint32_t handle,
test->test_val,
MI_BATCH_BUFFER_END,
};
+   int bad_lri_errno = parser_version >= 8 ? 0 : -EINVAL;
+   int expected_errno = test->whitelisted ? 0 : bad_lri_errno;
+   uint32_t expect = test->whitelisted ? test->test_val : test->init_val;
+
+   igt_debug("Testing %s LRI: addr=%x, val=%x, expected errno=%d, expected 
val=%x\n",
+ test->name, test->reg, test->test_val,
+ expected_errno, expect);
 
intel_register_write(test->reg, test->init_val);
 
+   igt_assert_eq_u32((intel_register_read(test->reg) &
+  test->read_mask),
+ test->init_val);
+
exec_batch(fd, handle,
   lri, sizeof(lri),
   I915_EXEC_RENDER,
@@ -476,57 +492,43 @@ igt_main
}
 
igt_subtest_group {
+#define REG(R, MSK, INI, V, OK, MIN_V) { #R, R, MSK, INI, V, OK, MIN_V }
+   struct test_lri lris[] = {
+   /* dummy head pointer */
+   REG(OASTATUS2,
+   0xff80, 0xdeadf000, 0xbeeff000, false, 0),
+   /* NB: [1:0] MBZ */
+   REG(SO_WRITE_OFFSET_0,
+   0xfffc, 0xabcdabc0, 0xbeefbee0, true, 0),
+
+   /* It's really important for us to check that
+* an LRI to OACONTROL doesn't result in an
+* EINVAL error because Mesa attempts writing
+* to OACONTROL to determine what extensions to
+* expose and will abort() for execbuffer()
+* errors.
+*
+* Mesa can gracefully recognise and handle the
+* LRI becoming a NOOP.
+*
+* The test values represent dummy context IDs
+* while leaving the OA unit disabled
+*/
+   REG(OACONTROL,
+   0xf000, 0xfeed, 0x31337000, false, 9)
+   };
+#undef REG
+
igt_fixture {
intel_register_access_init(intel_get_pci_device(), 0);
}
 
-   igt_subtest("registers") {
-   struct test_lri bad_lris[] = {
-   /* dummy head pointer */
-   { OASTATUS2, 0xff80, 0xdeadf000, 0xbeeff000 
}
-   };
-   struct test_lri v9_bad_lris[] = {
-   /* It's really important for us to check that
-* an LRI to OACONTROL doesn't result in an
-* EINVAL error because Mesa attempts writing
-* to OACONTROL to determine what

Re: [Intel-gfx] [PATCH] drm/i915/perf: Wrap 64bit divides in do_div()

2016-11-23 Thread Robert Bragg
On Nov 22, 2016 23:49, "Chris Wilson" <ch...@chris-wilson.co.uk> wrote:
>
> On Tue, Nov 22, 2016 at 11:32:38PM +, Robert Bragg wrote:
> >Thanks for sending out. It looked good to me, but testing shows a
'divide
> >error'.
> >
> >I haven't double checked, but I think it's because the max OA
exponent
> >(31) converted to nanoseconds is > UINT32_MAX with the lower 32bits
zero
> >and the do_div denominator argument is only 32bit.
>
> Hmm, I thought do_div() was u64 / u64, but no it is u64 / u32. Looks
> like the appropriate function would be div64_u64().
>
> >It corresponds to a 5 minute period which is a bit silly, so we could
> >reduce the max exponent. A period of UINT32_MAX is about 4 seconds
where I
> >can't currently think of a good use case for such a low frequency.
> >
> >Instead of changing the max OA exponent (where the relationship to
the
> >period changes for gen9 and may become fuzzy if we start training
our view
> >of the gpu timestamp frequency instead of using constants) maybe we
should
> >set an early limit on an exponent resulting in a period > UINT32_MAX?
>
> Seems like picking the right function would help!

Or that, yep. Sounds good to me, thanks.
- Robert

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/perf: Wrap 64bit divides in do_div()

2016-11-22 Thread Robert Bragg
Thanks for sending out. It looked good to me, but testing shows a 'divide
error'.

I haven't double checked, but I think it's because the max OA exponent (31)
converted to nanoseconds is > UINT32_MAX with the lower 32bits zero and the
do_div denominator argument is only 32bit.

It corresponds to a 5 minute period which is a bit silly, so we could
reduce the max exponent. A period of UINT32_MAX is about 4 seconds where I
can't currently think of a good use case for such a low frequency.

Instead of changing the max OA exponent (where the relationship to the
period changes for gen9 and may become fuzzy if we start training our view
of the gpu timestamp frequency instead of using constants) maybe we should
set an early limit on an exponent resulting in a period > UINT32_MAX?

- Robert


On Tue, Nov 22, 2016 at 9:14 PM, Chris Wilson <ch...@chris-wilson.co.uk>
wrote:

> Just a couple of naked 64bit divides causing link errors on 32bit
> builds, with:
>
> ERROR: "__udivdi3" [drivers/gpu/drm/i915/i915.ko] undefined!
>
> Reported-by: kbuild test robot <fengguang...@intel.com>
> Fixes: d79651522e89 ("drm/i915: Enable i915 perf stream for Haswell OA
> unit")
> Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> Cc: Robert Bragg <rob...@sixbynine.org>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c
> b/drivers/gpu/drm/i915/i915_perf.c
> index 95512824922b..7d00532ae010 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -974,8 +974,12 @@ static void i915_oa_stream_disable(struct
> i915_perf_stream *stream)
>
>  static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int
> exponent)
>  {
> -   return 10ULL * (2ULL << exponent) /
> -   dev_priv->perf.oa.timestamp_frequency;
> +   u64 interval;
> +
> +   interval = 10ULL * (2ULL << exponent);
> +   do_div(interval, dev_priv->perf.oa.timestamp_frequency);
> +
> +   return interval;
>  }
>
>  static const struct i915_perf_stream_ops i915_oa_stream_ops = {
> @@ -1051,16 +1055,17 @@ static int i915_oa_stream_init(struct
> i915_perf_stream *stream,
>
> dev_priv->perf.oa.periodic = props->oa_periodic;
> if (dev_priv->perf.oa.periodic) {
> -   u64 period_ns = oa_exponent_to_ns(dev_priv,
> -
>  props->oa_period_exponent);
> +   u64 margin;
>
> dev_priv->perf.oa.period_exponent =
> props->oa_period_exponent;
>
> /* See comment for OA_TAIL_MARGIN_NSEC for details
>  * about this tail_margin...
>  */
> -   dev_priv->perf.oa.tail_margin =
> -   ((OA_TAIL_MARGIN_NSEC / period_ns) + 1) *
> format_size;
> +   margin = OA_TAIL_MARGIN_NSEC;
> +   do_div(margin,
> +  oa_exponent_to_ns(dev_priv,
> props->oa_period_exponent));
> +   dev_priv->perf.oa.tail_margin = (margin + 1) * format_size;
> }
>
> if (stream->ctx) {
> --
> 2.10.2
>
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH igt] igt/gem_exec_parse: test_lri check init + add debug msg

2016-11-22 Thread Robert Bragg
Just to note I haven't tested yet as I don't have hsw to hand, but seems
simple enough to send out anyway...

--- >8 ---

To make it clear on failure what register was being tested the test_lri
helper now uses igt_debug to log the register address and value being
tested. The test_lri helper now also double checks that the initial
intel_register_write() takes before issuing the LRI.

Signed-off-by: Robert Bragg <rob...@sixbynine.org>
---
 tests/gem_exec_parse.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c
index cc2103a..534a933 100644
--- a/tests/gem_exec_parse.c
+++ b/tests/gem_exec_parse.c
@@ -272,8 +272,15 @@ test_lri(int fd, uint32_t handle,
MI_BATCH_BUFFER_END,
};
 
+   igt_debug("testing lri, reg=%x, val=%x, expected errno=%d\n",
+ test->reg, test->test_val, expected_errno);
+
intel_register_write(test->reg, test->init_val);
 
+   igt_assert_eq_u32((intel_register_read(test->reg) &
+  test->read_mask),
+ test->init_val);
+
exec_batch(fd, handle,
   lri, sizeof(lri),
   I915_EXEC_RENDER,
-- 
2.10.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2] drm/i915: don't whitelist oacontrol in cmd parser

2016-11-22 Thread Robert Bragg
On Tue, Nov 22, 2016 at 1:34 PM, Daniel Vetter <dan...@ffwll.ch> wrote:

> On Tue, Nov 08, 2016 at 12:51:48PM +0000, Robert Bragg wrote:
> > This v2 patch bumps the command parser version so it can be referenced in
> > corresponding i-g-t gem_exec_parse changes.
> >
> > --- >8 ---
>
> Scissors cut everything below, not everything above, hence next time
> around pls switch around your comment and the commit message, as-is not
> much left ;-)
>

Hmm, they cut away what's above and keep what's below in my experience -
what command are you seeing the opposite with?

I just double checked this with git am --scissors

- Robert



>
> Fixed up while applying.
> -Daniel
>
> >
> > Being able to program OACONTROL from a non-privileged batch buffer is
> > not sufficient to be able to configure the OA unit. This was originally
> > allowed to help enable Mesa to expose OA counters via the
> > INTEL_performance_query extension, but the current implementation based
> > on programming OACONTROL via a batch buffer isn't able to report useable
> > data without a more complete OA unit configuration. Mesa handles the
> > possibility that writes to OACONTROL may not be allowed and so only
> > advertises the extension after explicitly testing that a write to
> > OACONTROL succeeds. Based on this; removing OACONTROL from the whitelist
> > should be ok for userspace.
> >
> > Removing this simplifies adding a new kernel api for configuring the OA
> > unit without needing to consider the possibility that userspace might
> > trample on OACONTROL state which we'd like to start managing within
> > the kernel instead. In particular running any Mesa based GL application
> > currently results in clearing OACONTROL when initializing which would
> > disable the capturing of metrics.
> >
> > v2:
> > This bumps the command parser version from 8 to 9, as the change is
> > visible to userspace.
> >
> > Signed-off-by: Robert Bragg <rob...@sixbynine.org>
> > Reviewed-by: Matthew Auld <matthew.a...@intel.com>
> > Reviewed-by: Sourab Gupta <sourab.gu...@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_cmd_parser.c | 42
> --
> >  1 file changed, 5 insertions(+), 37 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c
> b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > index c9d2ecd..f5762cd 100644
> > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -450,7 +450,6 @@ static const struct drm_i915_reg_descriptor
> gen7_render_regs[] = {
> >   REG64(PS_INVOCATION_COUNT),
> >   REG64(PS_DEPTH_COUNT),
> >   REG64_IDX(RING_TIMESTAMP, RENDER_RING_BASE),
> > - REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below.
> */
> >   REG64(MI_PREDICATE_SRC0),
> >   REG64(MI_PREDICATE_SRC1),
> >   REG32(GEN7_3DPRIM_END_OFFSET),
> > @@ -1060,8 +1059,7 @@ bool intel_engine_needs_cmd_parser(struct
> intel_engine_cs *engine)
> >  static bool check_cmd(const struct intel_engine_cs *engine,
> > const struct drm_i915_cmd_descriptor *desc,
> > const u32 *cmd, u32 length,
> > -   const bool is_master,
> > -   bool *oacontrol_set)
> > +   const bool is_master)
> >  {
> >   if (desc->flags & CMD_DESC_SKIP)
> >   return true;
> > @@ -1099,31 +1097,6 @@ static bool check_cmd(const struct
> intel_engine_cs *engine,
> >   }
> >
> >   /*
> > -  * OACONTROL requires some special handling for
> > -  * writes. We want to make sure that any batch
> which
> > -  * enables OA also disables it before the end of
> the
> > -  * batch. The goal is to prevent one process from
> > -  * snooping on the perf data from another process.
> To do
> > -  * that, we need to check the value that will be
> written
> > -  * to the register. Hence, limit OACONTROL writes
> to
> > -  * only MI_LOAD_REGISTER_IMM commands.
> > -  */
> > - if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL))
> {
> > - if (desc->cmd.value ==
> MI_LOAD_REGISTER_MEM) {
> > - DRM_DEBUG_DRIVER("CMD: Rejected
> LRM to OACONTROL\n&

  1   2   3   4   >