[PATCH v9 11/11] drm/i915: Add a kerneldoc summary for i915_perf.c

2016-11-22 Thread Daniel Vetter
On Mon, Nov 07, 2016 at 07:49:57PM +, Robert Bragg wrote:
> In particular this tries to capture for posterity some of the early
> challenges we had with using the core perf infrastructure in case we
> ever want to revisit adapting perf for device metrics.
> 
> Cc: Chris Wilson 
> Signed-off-by: Robert Bragg 
> Reviewed-by: Matthew Auld 
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 163 
> +++

Missing the include stanza in Documentation/gpu/i915.rst, which means this
won't show up in the docs build. Also function/struct docs are not
included either it seems.

Please check out Documentation/kernel-documentation.rst and build it all
with

$ make DOCBOOKS="" htmldocs

Again fixup patch is fine.
-Daniel

>  1 file changed, 163 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> b/drivers/gpu/drm/i915/i915_perf.c
> index 1a87fe9..9551282 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -24,6 +24,169 @@
>   *   Robert Bragg 
>   */
>  
> +
> +/**
> + * DOC: i915 Perf, streaming API for GPU metrics
> + *
> + * Gen graphics supports a large number of performance counters that can help
> + * driver and application developers understand and optimize their use of the
> + * GPU.
> + *
> + * This i915 perf interface enables userspace to configure and open a file
> + * descriptor representing a stream of GPU metrics which can then be read() 
> as
> + * a stream of sample records.
> + *
> + * The interface is particularly suited to exposing buffered metrics that are
> + * captured by DMA from the GPU, unsynchronized with and unrelated to the 
> CPU.
> + *
> + * Streams representing a single context are accessible to applications with 
> a
> + * corresponding drm file descriptor, such that OpenGL can use the interface
> + * without special privileges. Access to system-wide metrics requires root
> + * privileges by default, unless changed via the dev.i915.perf_event_paranoid
> + * sysctl option.
> + *
> + *
> + * The interface was initially inspired by the core Perf infrastructure but
> + * some notable differences are:
> + *
> + * i915 perf file descriptors represent a "stream" instead of an "event"; 
> where
> + * a perf event primarily corresponds to a single 64bit value, while a stream
> + * might sample sets of tightly-coupled counters, depending on the
> + * configuration.  For example the Gen OA unit isn't designed to support
> + * orthogonal configurations of individual counters; it's configured for a 
> set
> + * of related counters. Samples for an i915 perf stream capturing OA metrics
> + * will include a set of counter values packed in a compact HW specific 
> format.
> + * The OA unit supports a number of different packing formats which can be
> + * selected by the user opening the stream. Perf has support for grouping
> + * events, but each event in the group is configured, validated and
> + * authenticated individually with separate system calls.
> + *
> + * i915 perf stream configurations are provided as an array of u64 
> (key,value)
> + * pairs, instead of a fixed struct with multiple miscellaneous config 
> members,
> + * interleaved with event-type specific members.
> + *
> + * i915 perf doesn't support exposing metrics via an mmap'd circular buffer.
> + * The supported metrics are being written to memory by the GPU 
> unsynchronized
> + * with the CPU, using HW specific packing formats for counter sets. 
> Sometimes
> + * the constraints on HW configuration require reports to be filtered before 
> it
> + * would be acceptable to expose them to unprivileged applications - to hide
> + * the metrics of other processes/contexts. For these use cases a read() 
> based
> + * interface is a good fit, and provides an opportunity to filter data as it
> + * gets copied from the GPU mapped buffers to userspace buffers.
> + *
> + *
> + * Some notes regarding Linux Perf:
> + * 
> + *
> + * The first prototype of this driver was based on the core perf
> + * infrastructure, and while we did make that mostly work, with some changes 
> to
> + * perf, we found we were breaking or working around too many assumptions 
> baked
> + * into perf's currently cpu centric design.
> + *
> + * In the end we didn't see a clear benefit to making perf's implementation 
> and
> + * interface more complex by changing design assumptions while we knew we 
> still
> + * wouldn't be able to use any existing perf based userspace tools.
> + *
> + * Also considering the Gen specific nature of the Observability hardware and
> + * how userspace will sometimes need to combine i915 perf OA metrics with
> + * side-band OA data captured via MI_REPORT_PERF_COUNT commands; we're
> + * expecting the interface to be used by a platform specific userspace such 
> as
> + * OpenGL or tools. This is to say; we aren't inherently missing out on 
> having
> + * a standard vendor/architecture agnostic interface by not using perf.
> + 

[PATCH v9 11/11] drm/i915: Add a kerneldoc summary for i915_perf.c

2016-11-08 Thread sourab gupta
On Mon, 2016-11-07 at 11:49 -0800, Robert Bragg wrote:
> In particular this tries to capture for posterity some of the early
> challenges we had with using the core perf infrastructure in case we
> ever want to revisit adapting perf for device metrics.
> 
> Cc: Chris Wilson 
> Signed-off-by: Robert Bragg 
> Reviewed-by: Matthew Auld 
> ---
Good summary of early challenges faced while adapting core perf.

Reviewed-by: Sourab Gupta 



[PATCH v9 11/11] drm/i915: Add a kerneldoc summary for i915_perf.c

2016-11-07 Thread Robert Bragg
In particular this tries to capture for posterity some of the early
challenges we had with using the core perf infrastructure in case we
ever want to revisit adapting perf for device metrics.

Cc: Chris Wilson 
Signed-off-by: Robert Bragg 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/i915_perf.c | 163 +++
 1 file changed, 163 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 1a87fe9..9551282 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -24,6 +24,169 @@
  *   Robert Bragg 
  */

+
+/**
+ * DOC: i915 Perf, streaming API for GPU metrics
+ *
+ * Gen graphics supports a large number of performance counters that can help
+ * driver and application developers understand and optimize their use of the
+ * GPU.
+ *
+ * This i915 perf interface enables userspace to configure and open a file
+ * descriptor representing a stream of GPU metrics which can then be read() as
+ * a stream of sample records.
+ *
+ * The interface is particularly suited to exposing buffered metrics that are
+ * captured by DMA from the GPU, unsynchronized with and unrelated to the CPU.
+ *
+ * Streams representing a single context are accessible to applications with a
+ * corresponding drm file descriptor, such that OpenGL can use the interface
+ * without special privileges. Access to system-wide metrics requires root
+ * privileges by default, unless changed via the dev.i915.perf_event_paranoid
+ * sysctl option.
+ *
+ *
+ * The interface was initially inspired by the core Perf infrastructure but
+ * some notable differences are:
+ *
+ * i915 perf file descriptors represent a "stream" instead of an "event"; where
+ * a perf event primarily corresponds to a single 64bit value, while a stream
+ * might sample sets of tightly-coupled counters, depending on the
+ * configuration.  For example the Gen OA unit isn't designed to support
+ * orthogonal configurations of individual counters; it's configured for a set
+ * of related counters. Samples for an i915 perf stream capturing OA metrics
+ * will include a set of counter values packed in a compact HW specific format.
+ * The OA unit supports a number of different packing formats which can be
+ * selected by the user opening the stream. Perf has support for grouping
+ * events, but each event in the group is configured, validated and
+ * authenticated individually with separate system calls.
+ *
+ * i915 perf stream configurations are provided as an array of u64 (key,value)
+ * pairs, instead of a fixed struct with multiple miscellaneous config members,
+ * interleaved with event-type specific members.
+ *
+ * i915 perf doesn't support exposing metrics via an mmap'd circular buffer.
+ * The supported metrics are being written to memory by the GPU unsynchronized
+ * with the CPU, using HW specific packing formats for counter sets. Sometimes
+ * the constraints on HW configuration require reports to be filtered before it
+ * would be acceptable to expose them to unprivileged applications - to hide
+ * the metrics of other processes/contexts. For these use cases a read() based
+ * interface is a good fit, and provides an opportunity to filter data as it
+ * gets copied from the GPU mapped buffers to userspace buffers.
+ *
+ *
+ * Some notes regarding Linux Perf:
+ * 
+ *
+ * The first prototype of this driver was based on the core perf
+ * infrastructure, and while we did make that mostly work, with some changes to
+ * perf, we found we were breaking or working around too many assumptions baked
+ * into perf's currently cpu centric design.
+ *
+ * In the end we didn't see a clear benefit to making perf's implementation and
+ * interface more complex by changing design assumptions while we knew we still
+ * wouldn't be able to use any existing perf based userspace tools.
+ *
+ * Also considering the Gen specific nature of the Observability hardware and
+ * how userspace will sometimes need to combine i915 perf OA metrics with
+ * side-band OA data captured via MI_REPORT_PERF_COUNT commands; we're
+ * expecting the interface to be used by a platform specific userspace such as
+ * OpenGL or tools. This is to say; we aren't inherently missing out on having
+ * a standard vendor/architecture agnostic interface by not using perf.
+ *
+ *
+ * For posterity, in case we might re-visit trying to adapt core perf to be
+ * better suited to exposing i915 metrics these were the main pain points we
+ * hit:
+ *
+ * - The perf based OA PMU driver broke some significant design assumptions:
+ *
+ *   Existing perf pmus are used for profiling work on a cpu and we were
+ *   introducing the idea of _IS_DEVICE pmus with different security
+ *   implications, the need to fake cpu-related data (such as user/kernel
+ *   registers) to fit with perf's current design, and adding _DEVICE records
+ *   as a way to forward device-specific status records.
+ *