[Intel-gfx] [PATCH v5 1/3] i915.rst: Narration overview on GEM + minor reorder to improve narration

2018-04-06 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Add a narration to i915.rst about Intel GEN GPU's: engines,
driver context and relocation.

v5:
  More type fixes.
  Flow bullet list so lines are not too long.

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahti...@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuopp...@linux.intel.com>
---
 Documentation/gpu/i915.rst | 120 -
 1 file changed, 97 insertions(+), 23 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 7ecad7134677..cd2d796d23dd 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -249,6 +249,103 @@ Memory Management and Command Submission
 This sections covers all things related to the GEM implementation in the
 i915 driver.
 
+Intel GPU Basics
+
+
+An Intel GPU has multiple engines. There are several engine types.
+
+- RCS engine is for rendering 3D and performing compute, this is named
+  `I915_EXEC_RENDER` in user space.
+- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user
+  space.
+- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD`
+  in user space
+- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user
+  space.
+- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine;
+  instead it is to be used by user space to specify a default rendering
+  engine (for 3D) that may or may not be the same as RCS.
+
+The Intel GPU family is a family of integrated GPU's using Unified
+Memory Access. For having the GPU "do work", user space will feed the
+GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2`
+or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will
+instruct the GPU to perform work (for example rendering) and that work
+needs memory from which to read and memory to which to write. All memory
+is encapsulated within GEM buffer objects (usually created with the ioctl
+`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU
+to create will also list all GEM buffer objects that the batchbuffer reads
+and/or writes. For implementation details of memory management see
+`GEM BO Management Implementation Details`_.
+
+The i915 driver allows user space to create a context via the ioctl
+`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit
+integer. Such a context should be viewed by user-space as -loosely-
+analogous to the idea of a CPU process of an operating system. The i915
+driver guarantees that commands issued to a fixed context are to be
+executed so that writes of a previously issued command are seen by
+reads of following commands. Actions issued between different contexts
+(even if from the same file descriptor) are NOT given that guarantee
+and the only way to synchronize across contexts (even from the same
+file descriptor) is through the use of fences. At least as far back as
+Gen4, also have that a context carries with it a GPU HW context;
+the HW context is essentially (most of atleast) the state of a GPU.
+In addition to the ordering guarantees, the kernel will restore GPU
+state via HW context when commands are issued to a context, this saves
+user space the need to restore (most of atleast) the GPU state at the
+start of each batchbuffer. The non-deprecated ioctls to submit batchbuffer
+work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1)
+to identify what context to use with the command.
+
+The GPU has its own memory management and address space. The kernel
+driver maintains the memory translation table for the GPU. For older
+GPUs (i.e. those before Gen8), there is a single global such translation
+table, a global Graphics Translation Table (GTT). For newer generation
+GPUs each context has its own translation table, called Per-Process
+Graphics Translation Table (PPGTT). Of important note, is that although
+PPGTT is named per-process it is actually per context. When user space
+submits a batchbuffer, the kernel walks the list of GEM buffer objects
+used by the batchbuffer and guarantees that not only is the memory of
+each such GEM buffer object resident but it is also present in the
+(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT,
+then it is given an address. Two consequences of this are: the kernel
+needs to edit the batchbuffer submitted to write the correct value of
+the GPU address when a GEM BO is assigned a GPU address and the kernel
+might evict a different GEM BO from the (PP)GTT to make address room
+for another GEM BO. Consequently, the ioctls submitting a batchbuffer
+for execution also include a list of all locations within buffers that
+refer to GPU-addresses so that the kernel can edit the buffer correctly.
+This process is dubbed relocation.
+
+GEM BO Management Implementation Details
+

[Intel-gfx] [PATCH v5 2/3] i915.rst: add link to documentation in i915_gem_execbuffer.c

2018-04-06 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Add the documentation of "DOC: User command execution" of
i915_gem_execbuffer.c into a new section in i915.rst.

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahti...@linux.intel.com>

---
 Documentation/gpu/i915.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index cd2d796d23dd..34d22f275708 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -364,6 +364,12 @@ Batchbuffer Pools
 .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c
:internal:
 
+User Batchbuffer Execution
+--
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
+   :doc: User command execution
+
 Logical Rings, Logical Ring Contexts and Execlists
 --
 
-- 
2.16.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 3/3] i915: add a text about what happens at bottom of stack in processing a batchbuffer

2018-04-06 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Now that "DOC: User command execution" of i915_gem_execbuffer.c is included
in the i915.rst, it is benecifial (for new developers) to read what happens
at the bottom of the driver stack (in terms of bytes written to be read
by the GPU) when processing a user-space batchbuffer.

v5:
  Typo correction of lacking double tick.

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahti...@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8c170db8495d..407a4a8ec61e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -81,6 +81,35 @@ enum {
  * but this remains just a hint as the kernel may choose a new location for
  * any object in the future.
  *
+ * At the level of talking to the hardware, submitting a batchbuffer for the
+ * GPU to execute is to add content to a buffer from which the HW
+ * command streamer is reading.
+ *
+ * 1. Add a command to load the HW context. For Logical Ring Contexts, i.e.
+ *Execlists, this command is not placed on the same buffer as the
+ *remaining items.
+ *
+ * 2. Add a command to invalidate caches to the buffer.
+ *
+ * 3. Add a batchbuffer start command to the buffer; the start command is
+ *essentially a token together with the GPU address of the batchbuffer
+ *to be executed.
+ *
+ * 4. Add a pipeline flush to the buffer.
+ *
+ * 5. Add a memory write command to the buffer to record when the GPU
+ *is done executing the batchbuffer. The memory write writes the
+ *global sequence number of the request, ``i915_request::global_seqno``;
+ *the i915 driver uses the current value in the register to determine
+ *if the GPU has completed the batchbuffer.
+ *
+ * 6. Add a user interrupt command to the buffer. This command instructs
+ *the GPU to issue an interrupt when the command, pipeline flush and
+ *memory write are completed.
+ *
+ * 7. Inform the hardware of the additional commands added to the buffer
+ *(by updating the tail pointer).
+ *
  * Processing an execbuf ioctl is conceptually split up into a few phases.
  *
  * 1. Validation - Ensure all the pointers, handles and flags are valid.
-- 
2.16.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 0/3] Documentation patch for batchbuffer submission

2018-04-06 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

v5:
   In interest of getting reviewed patches in sonner, last two patches
   dropped for a later series.

   Permute patch order of 2 and 3 (this is what the v4 series should
   have had).

v4:
   Drop some details from narration to provide better focus.
   (suggested/requested by Chris Wilson)

   Spelling and grammar fixes.
   (errors caught by Tvrtko Ursuli)

   Move "at the bottom" details to part of documentation in
   i915_gem_execbuffer.c.
   (suggested by Chris Wilson)

   Place additional documentation of intel_engine_cs function
   pointers to inlined in struct.

   Fix documentation error about lazy creation of intel_ringbuffer
   and backing store in intel_lrc.c.
   (Caught by Chris Wilson)

v3:
   More documentation: emphasize that handling of batchbuffer
   requests creates a request struct that is added to the
   dependency tree and that the inform to the hardware that
   there is new data on a ringbuffer is deferred until dependencies
   are satsified.

   Break patch into more digestable chunks.  

v2:
   More documentation: intel_ringbuffer, sequence number.
   Expose to i915.rst existing documentation

   Call out GEM_EXECBUFFER as deprecated.
   Place code detailed documentation in source files.
   Call out INTEL_EXEC_RENDER.
   Reorder text to make it more readable.
   Refer to Command Buffer Parser instead of Batchbuffer Parser.
   (suggested by Joonas Lahtinen)

Kevin Rogovin (3):
  i915.rst: Narration overview on GEM + minor reorder to improve
narration
  i915.rst: add link to documentation in i915_gem_execbuffer.c
  i915: add a text about what happens at bottom of stack in processing a
batchbuffer

 Documentation/gpu/i915.rst | 126 +++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  29 +++
 2 files changed, 132 insertions(+), 23 deletions(-)

-- 
2.16.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 0/5] Documentation patch for batchbuffer submission

2018-04-03 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Note: I want to make a one or two follow-up series that provide
narration and potentially additional documentation for GUC submission
and the breadcrumbs.

v4:
   Drop some details from narration to provide better focus.
   (suggested/requested by Chris Wilson)

   Spelling and grammar fixes.
   (errors caught by Tvrtko Ursuli)

   Move "at the bottom" details to part of documentation in
   i915_gem_execbuffer.c.
   (suggested by Chris Wilson)

   Place additional documentation of intel_engine_cs function
   pointers to inlined in struct.

   Fix documentation error about lazy creation of intel_ringbuffer
   and backing store in intel_lrc.c.
   (Caught by Chris Wilson)

v3:
   More documentation: emphasize that handling of batchbuffer
   requests creates a request struct that is added to the
   dependency tree and that the inform to the hardware that
   there is new data on a ringbuffer is deferred until dependencies
   are satsified.

   Break patch into more digestable chunks.  

v2:
   More documentation: intel_ringbuffer, sequence number.
   Expose to i915.rst existing documentation

   Call out GEM_EXECBUFFER as deprecated.
   Place code detailed documentation in source files.
   Call out INTEL_EXEC_RENDER.
   Reorder text to make it more readable.
   Refer to Command Buffer Parser instead of Batchbuffer Parser.
   (suggested by Joonas Lahtinen)

Kevin Rogovin (5):
  i915.rst: Narration overview on GEM + minor reorder to improve
narration
  i915: add a text about what happens at bottom of stack in processing a
batchbuffer
  i915.rst: add link to documentation in i915_gem_execbuffer.c
  i915: correct lazy ringbuffer and backing store documentation
  i915: add documentation to intel_engine_cs

 Documentation/gpu/i915.rst | 122 +++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  22 ++
 drivers/gpu/drm/i915/i915_vma.h|  10 ++-
 drivers/gpu/drm/i915/intel_lrc.c   |  13 +--
 drivers/gpu/drm/i915/intel_ringbuffer.h|  29 +++
 5 files changed, 158 insertions(+), 38 deletions(-)

-- 
2.16.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 5/5] i915: add documentation to intel_engine_cs

2018-04-03 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Add documentation to a number of the function pointer fields of
intel_engine_cs.

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.h | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 1f50727a5ddb..eafd1690acde 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -426,23 +426,52 @@ struct intel_engine_cs {
 
void(*set_default_submission)(struct intel_engine_cs 
*engine);
 
+   /* In addition to pinning the context, returns the intel_ringbuffer
+* to which to write commands.
+*/
struct intel_ring *(*context_pin)(struct intel_engine_cs *engine,
  struct i915_gem_context *ctx);
void(*context_unpin)(struct intel_engine_cs *engine,
 struct i915_gem_context *ctx);
+
+   /* Request room on the ringbuffer of a request in order to write
+* commands for a request; In addition, if necessary, add commands
+* to the buffer so that the i915_gem_context of the request
+* is the one active for the commands.
+*/
int (*request_alloc)(struct i915_request *rq);
+
+   /* Called only once (and only if non-NULL) for an engine; used to
+* initialize the global driver default context.
+*/
int (*init_context)(struct i915_request *rq);
 
+   /* Add a GPU command to cache invalidate with EMIT_INVALIDATE,
+* to pipeline flush with EMIT_FLUSH or to do both with EMIT_BARRIER;
+* the GPU command is added to the buffer holding the commands of
+* the request (i.e. calling intel_ring_begin() on
+* i915_request::ring).
+*/
int (*emit_flush)(struct i915_request *request, u32 mode);
 #define EMIT_INVALIDATEBIT(0)
 #define EMIT_FLUSH BIT(1)
 #define EMIT_BARRIER   (EMIT_INVALIDATE | EMIT_FLUSH)
+   /* Add a batchbuffer start command; the GPU command is added to
+* the buffer holding the commands of the request (i.e. calling
+* intel_ring_begin() on i915_request::ring).
+*/
int (*emit_bb_start)(struct i915_request *rq,
 u64 offset, u32 length,
 unsigned int dispatch_flags);
 #define I915_DISPATCH_SECURE BIT(0)
 #define I915_DISPATCH_PINNED BIT(1)
 #define I915_DISPATCH_RS BIT(2)
+   /* Add a memory write command that writes the global sequence number
+* (i915_request::global_seqno) and also add an interrupt command;
+* the GPU command is added to the buffer holding the commands of
+* the request (i.e. calling intel_ring_begin() on
+* i915_request::ring).
+*/
void(*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
int emit_breadcrumb_sz;
 
-- 
2.16.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 2/5] i915: add a text about what happens at bottom of stack in processing a batchbuffer

2018-04-03 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Now that "DOC: User command execution" of i915_gem_execbuffer.c is included
in the i915.rst, it is benecifial (for new developers) to read what happens
at the bottom of the driver stack (in terms of bytes written to be read
by the GPU) when processing a user-space batchbuffer.

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8c170db8495d..1fe5da1fed47 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -81,6 +81,28 @@ enum {
  * but this remains just a hint as the kernel may choose a new location for
  * any object in the future.
  *
+ * At the level of talking to the hardware, submitting a batchbuffer for the
+ * GPU to execute is to add content to a buffer from which the HW
+ * command streamer is reading.
+ * 1. Add a command to load the HW context. For Logical Ring Contexts, i.e.
+ *Execlists, this command is not placed on the same buffer as the
+ *remaining items.
+ * 2. Add a command to invalidate caches to the buffer.
+ * 3. Add a batchbuffer start command to the buffer; the start command is
+ *essentially a token together with the GPU address of the batchbuffer
+ *to be executed.
+ * 4. Add a pipeline flush to the buffer.
+ * 5. Add a memory write command to the buffer to record when the GPU
+ *is done executing the batchbuffer. The memory write writes the
+ *global sequence number of the request, `i915_request::global_seqno``;
+ *the i915 driver uses the current value in the register to determine
+ *if the GPU has completed the batchbuffer.
+ * 6. Add a user interrupt command to the buffer. This command instructs
+ *the GPU to issue an interrupt when the command, pipeline flush and
+ *memory write are completed.
+ * 7. Inform the hardware of the additional commands added to the buffer
+ *(by updating the tail pointer).
+ *
  * Processing an execbuf ioctl is conceptually split up into a few phases.
  *
  * 1. Validation - Ensure all the pointers, handles and flags are valid.
-- 
2.16.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration

2018-04-03 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Add a narration to i915.rst about Intel GEN GPU's: engines,
driver context and relocation.

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
---
 Documentation/gpu/i915.rst  | 116 
 drivers/gpu/drm/i915/i915_vma.h |  10 ++--
 2 files changed, 100 insertions(+), 26 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 41dc881b00dc..00f897f67f85 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -249,6 +249,99 @@ Memory Management and Command Submission
 This sections covers all things related to the GEM implementation in the
 i915 driver.
 
+Intel GPU Basics
+
+
+An Intel GPU has multiple engines. There are several engine types.
+
+- RCS engine is for rendering 3D and performing compute, this is named 
`I915_EXEC_RENDER` in user space.
+- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
+- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in 
user space
+- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user 
space.
+- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine; 
instead it is to be used by user space to specify a default rendering engine 
(for 3D) that may or may not be the same as RCS.
+
+The Intel GPU family is a family of integrated GPU's using Unified
+Memory Access. For having the GPU "do work", user space will feed the
+GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2`
+or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will
+instruct the GPU to perform work (for example rendering) and that work
+needs memory from which to read and memory to which to write. All memory
+is encapsulated within GEM buffer objects (usually created with the ioctl
+`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU
+to create will also list all GEM buffer objects that the batchbuffer reads
+and/or writes. For implementation details of memory management see
+`GEM BO Management Implementation Details`_.
+
+The i915 driver allows user space to create a context via the ioctl
+`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit
+integer. Such a context should be veiwed by user-space as -loosely-
+analogous to the idea of a CPU process of an operating system. The i915
+driver guarantees that commands issued to a fixed context are to be
+executed so that writes of a previously issued command are seen by
+reads of following commands. Actions issued between different contexts
+(even if from the same file descriptor) are NOT given that guarantee
+and the only way to synchornize across contexts (even from the same
+file descriptor) is through the use of fences. At least as far back as
+Gen4, also have that a context carries with it a GPU HW context;
+the HW context is essentially (most of atleast) the state of a GPU.
+In addition to the ordering gaurantees, the kernel will restore GPU
+state via HW context when commands are issued to a context, this saves
+user space the need to restore (most of atleast) the GPU state at the
+start of each batchbuffer. The ioctl `DRM_IOCTL_I915_GEM_CONTEXT_CREATE`
+is used by user space to create a hardware context which is identified
+by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer
+work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1)
+to identify what context to use with the command.
+
+The GPU has its own memory management and address space. The kernel
+driver maintains the memory translation table for the GPU. For older
+GPUs (i.e. those before Gen8), there is a single global such translation
+table, a global Graphics Translation Table (GTT). For newer generation
+GPUs each context has its own translation table, called Per-Process
+Graphics Translation Table (PPGTT). Of important note, is that although
+PPGTT is named per-process it is actually per context. When user space
+submits a batchbuffer, the kernel walks the list of GEM buffer objects
+used by the batchbuffer and guarantees that not only is the memory of
+each such GEM buffer object resident but it is also present in the
+(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT,
+then it is given an address. Two consequences of this are: the kernel
+needs to edit the batchbuffer submitted to write the correct value of
+the GPU address when a GEM BO is assigned a GPU address and the kernel
+might evict a different GEM BO from the (PP)GTT to make address room
+for another GEM BO. Consequently, the ioctls submitting a batchbuffer
+for execution also include a list of all locations within buffers that
+refer to GPU-addresses so that the kernel can edit the buffer correctly.
+This process is dubbed relocation.
+
+GEM BO Management Implementation Details
+
+
+.. kernel-doc:: drivers/g

[Intel-gfx] [PATCH v4 3/5] i915.rst: add link to documentation in i915_gem_execbuffer.c

2018-04-03 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Add the documentation of "DOC: User command execution" of
i915_gem_execbuffer.c into a new section in i915.rst.

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
---
 Documentation/gpu/i915.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 00f897f67f85..efe15406f322 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -360,6 +360,12 @@ Batchbuffer Pools
 .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c
:internal:
 
+User Batchbuffer Execution
+--
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
+   :doc: User command execution
+
 Logical Rings, Logical Ring Contexts and Execlists
 --
 
-- 
2.16.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v4 4/5] i915: correct lazy ringbuffer and backing store documentation

2018-04-03 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Correct documentation of logical ring context implementation to note
that ringbuffer and backing store are created lazily for all context
types (driver global, local default context and local extra context).

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 53f1c009ed7b..382a4656a1d9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -70,22 +70,11 @@
  * - One ringbuffer per-engine inside each context.
  * - One backing object per-engine inside each context.
  *
- * The global default context starts its life with these new objects fully
- * allocated and populated. The local default context for each opened fd is
- * more complex, because we don't know at creation time which engine is going
- * to use them. To handle this, we have implemented a deferred creation of LR
- * contexts:
- *
- * The local context starts its life as a hollow or blank holder, that only
- * gets populated for a given engine once we receive an execbuffer. If later
+ * which are populated for a given engine once we receive an execbuffer.If 
later
  * on we receive another execbuffer ioctl for the same context but a different
  * engine, we allocate/populate a new ringbuffer and context backing object and
  * so on.
  *
- * Finally, regarding local contexts created using the ioctl call: as they are
- * only allowed with the render ring, we can allocate & populate them right
- * away (no need to defer anything, at least for now).
- *
  * Execlists implementation:
  * Execlists are the new method by which, on gen8+ hardware, workloads are
  * submitted for execution (as opposed to the legacy, ringbuffer-based, 
method).
-- 
2.16.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v3 4/5] i915: add doc for synchronization

2018-03-27 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
---
 drivers/gpu/drm/i915/i915_request.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.h 
b/drivers/gpu/drm/i915/i915_request.h
index 7d6eb82..093b9d7 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -57,6 +57,34 @@ struct i915_dependency {
 #define I915_DEPENDENCY_ALLOC BIT(0)
 };
 
+/**
+ * DOC: Synchronization
+ *
+ * The i915 kernel driver needs to also synchronize the ordering in which
+ * to submit batchbuffers in following cases.
+ *
+ * 1. When a fixed file descriptor is used to submit work to different
+ *engines of the GPU; the issue is that the different engines run
+ *independently to each other but the kernel interface guarantees
+ *that the memory writes of previously issued commands within a
+ *fixed file descriptor will be seen by the next command.
+ *
+ * 2. When an ioctl to submit a batchbuffer has an in-fence that represents
+ *an action that needs to be completed or submitted before before the
+ *given command is submitted.
+ *
+ * Rather than waiting, the i915 kernel driver builds a data structure tree
+ * (represented by ``i915_priotree``) to store dependencies of different
+ * batchbuffer execute requests, the requests themselves are tracked via
+ * ``i915_request`` structs. The critical point that action of processing
+ * a user request does NOT wait for the dependencies to finish or even be
+ * submitted to the GPU. Instead, an ``i915_request`` structure is created
+ * and it is added to the dependency tree; when its dependencies are
+ * satisfied, then the call back updates the tail pointer of the ring buffer
+ * where the command was placed. This way, execute batchbuffer requests
+ * return from the kernel almost immediately.
+ */
+
 /*
  * "People assume that time is a strict progression of cause to effect, but
  * actually, from a nonlinear, non-subjective viewpoint, it's more like a big
-- 
2.7.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v3 5/5] i915.rst: narration for batchbuffer submission + links to source code doc entries on the subject

2018-03-27 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
---
 Documentation/gpu/i915.rst | 58 +++---
 1 file changed, 49 insertions(+), 9 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index ed8e08d..b23d5c9 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -355,6 +355,55 @@ objects, which has the goal to make space in gpu virtual 
address spaces.
 .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c
:internal:
 
+Batchbuffer Submission
+--
+
+Depending on GPU generation, the i915 kernel driver will submit batchbuffers
+in one of the several ways. However, the top code logic is shared for all
+methods, see `Common: At the bottom`_ and `Common: Processing requests`_
+for details. In addition, the kernel may filter the contents of user space
+provided batchbuffers. To that end the i915 driver has a
+`Batchbuffer Parsing`_ and a pool from which to allocate buffers to place
+filtered user space batchbuffers, see section `Batchbuffer Pools`_.
+
+Common: At the bottom
+~
+
+.. kernel-doc:: drivers/gpu/drm/i915/intel_ringbuffer.h
+   :doc: Ringbuffers to submit batchbuffers
+
+Common: Synchronization
+~~~
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_request.h
+   :doc: Synchronization
+
+Common: Processing requests
+~~~
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
+   :doc: User command execution
+
+Batchbuffer Submission Varieties
+
+
+.. kernel-doc:: drivers/gpu/drm/i915/intel_ringbuffer.h
+   :doc: Batchbuffer Submission Backend
+
+The two varieties for submitting batchbuffer to the GPU are the following.
+
+1. Batchbuffers are subbmitted directly to a ring buffer; this is the most 
basic way to submit batchbuffers to the GPU and is for generations strictly 
before Gen8.
+2. Batchbuffer are submitting via execlists are a features supported by Gen8 
and new devices; the macro :c:macro:'HAS_EXECLISTS' is used to determine if a 
GPU supports submitting via execlists, see `Logical Rings, Logical Ring 
Contexts and Execlists`_.
+
+Logical Rings, Logical Ring Contexts and Execlists
+--
+
+.. kernel-doc:: drivers/gpu/drm/i915/intel_lrc.c
+   :doc: Logical Rings, Logical Ring Contexts and Execlists
+
+.. kernel-doc:: drivers/gpu/drm/i915/intel_lrc.c
+   :internal:
+
 Batchbuffer Parsing
 ---
 
@@ -373,15 +422,6 @@ Batchbuffer Pools
 .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c
:internal:
 
-Logical Rings, Logical Ring Contexts and Execlists
---
-
-.. kernel-doc:: drivers/gpu/drm/i915/intel_lrc.c
-   :doc: Logical Rings, Logical Ring Contexts and Execlists
-
-.. kernel-doc:: drivers/gpu/drm/i915/intel_lrc.c
-   :internal:
-
 Global GTT views
 
 
-- 
2.7.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v3 3/5] i915: add documentation for a bit on batchbuffer submission backend

2018-03-27 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.h | 38 +
 1 file changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 62d3a22..2f8908e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -350,6 +350,44 @@ struct intel_engine_execlists {
  *the ringbuffer.
  */
 
+/**
+ * DOC: Batchbuffer Submission Backend
+ *
+ * The core logic of submitting a batchbuffer for the GPU to execute
+ * is shared across all engines for all GPU generations. Through the use
+ * of functions pointers, we can customize submission to different GPU
+ * capabilities. The struct ``intel_engine_cs`` has the following member
+ * function pointers for the following purposes in the scope of batchbuffer
+ * submission.
+ *
+ * - context_pin
+ * pins the context and also returns to what ``intel_ringbuffer``
+ * to write to submit a batchbuffer.
+ *
+ * - request_alloc
+ * is used to reserve space in an ``intel_ringbuffer``
+ * for submitting a batchbuffer to the GPU.
+ *
+ * - emit_flush
+ * writes a pipeline flush command and/or invalidate caches
+ * command to the ring buffer.
+ *
+ * - emit_bb_start
+ * writes the batchbuffer start command to the ringer buffer.
+ *
+ * - emit_breadcrumb
+ * writes to the ring buffer both the regiser write of the
+ * request ID (`i915_request::global_seqno`) and the command to
+ * issue an interrupt.
+ *
+ * - submit_request
+ * See the comment on this member in ``intel_engine_cs``, declared
+ * in intel_ringbuffer.h.
+ *
+ * In addition, the struct i915_request is used to track requests'
+ * dependency tree.
+ */
+
 struct intel_engine_cs {
struct drm_i915_private *i915;
char name[INTEL_ENGINE_CS_MAX_NAME];
-- 
2.7.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v3 0/6] Documentation patch for batchbuffer submission

2018-03-27 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Note: I want to make a one or two follow-up series that provide
narration and potentially additional documentation for GUC submission
and the breadcrumbs.

v3:
   More documentation: emphasize that handling of batchbuffer
   requests creates a request struct that is added to the
   dependency tree and that the inform to the hardware that
   there is new data on a ringbuffer is deferred until dependencies
   are satsified.

   Break patch into more digestable chunks.

v2:
   More documentation: intel_ringbuffer, sequence number.
   Expose to i915.rst existing documentation

   Call out GEM_EXECBUFFER as deprecated.
   Place code detailed documentation in source files.
   Call out INTEL_EXEC_RENDER.
   Reorder text to make it more readable.
   Refer to Command Buffer Parser instead of Batchbuffer Parser.
   (suggested by Joonas Lahtinen)

Kevin Rogovin (6):
  i915.rst: Narration overview on GEM + minor reorder to improve
narration
  i915.rst: rename section Batchbuffer Parsing to Command Buffer Parsing
  i915: add documentation of what happens at the bottom of submitting a
batchbuffer
  i915: add documentation for a bit on batchbuffer submission backend
  i915: add doc for synchronization
  i915.rst: narration for batchbuffer submission + links to source code
doc entries on the subject

 Documentation/gpu/i915.rst  | 189 ++--
 drivers/gpu/drm/i915/i915_request.h |  28 +
 drivers/gpu/drm/i915/i915_vma.h |  10 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  71 
 4 files changed, 262 insertions(+), 36 deletions(-)

-- 
2.16.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v3 1/5] i915.rst: Narration overview on GEM + minor reorder to improve narration

2018-03-27 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
---
 Documentation/gpu/i915.rst  | 129 +---
 drivers/gpu/drm/i915/i915_vma.h |  10 +++-
 2 files changed, 113 insertions(+), 26 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 41dc881..ed8e08d 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -249,6 +249,112 @@ Memory Management and Command Submission
 This sections covers all things related to the GEM implementation in the
 i915 driver.
 
+Intel GPU Basics
+
+
+An Intel GPU has multiple engines. There are several engine types.
+The user-space value `I915_EXEC_DEFAULT` is an alias to the user
+space value `I915_EXEC_RENDER`.
+
+- RCS engine is for rendering 3D and performing compute, this is named 
`I915_EXEC_RENDER` in user space.
+- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
+- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in 
user space
+- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user 
space.
+
+The Intel GPU family is a familiy of integrated GPU's using Unified
+Memory Access. For having the GPU "do work", user space will feed the
+GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2`
+or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR` (the ioctl 
`DRM_IOCTL_I915_GEM_EXECBUFFER`
+is deprecated). Most such batchbuffers will instruct the GPU to perform
+work (for example rendering) and that work needs memory from which to
+read and memory to which to write. All memory is encapsulated within GEM
+buffer objects (usually created with the ioctl `DRM_IOCTL_I915_GEM_CREATE`).
+An ioctl providing a batchbuffer for the GPU to create will also list
+all GEM buffer objects that the batchbuffer reads and/or writes. For
+implementation details of memory management see
+`GEM BO Management Implementation Details`_.
+
+A GPU pipeline (mostly strongly so for the RCS engine) has a great deal
+of state which is to be programmed by user space via the contents of a
+batchbuffer. Starting in Gen6 (SandyBridge), hardware contexts are
+supported. A hardware context encapsulates GPU pipeline state and other
+portions of GPU state and it is much more efficient for the GPU to load
+a hardware context instead of re-submitting commands in a batchbuffer to
+the GPU to restore state. In addition, using hardware contexts provides
+much better isolation between user space clients. The ioctl
+`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` is used by user space to create a
+hardware context which is identified by a 32-bit integer. The
+non-deprecated ioctls to submit batchbuffer work can pass that ID (in
+the lower bits of drm_i915_gem_execbuffer2::rsvd1) to identify what HW
+context to use with the command. When the kernel submits the batchbuffer
+to be executed by the GPU it will also instruct the GPU to load the HW
+context prior to executing the contents of a batchbuffer.
+
+The GPU has its own memory management and address space. The kernel
+driver maintains the memory translation table for the GPU. For older
+GPUs (i.e. those before Gen8), there is a single global such translation
+table, a global Graphics Translation Table (GTT). For newer generation
+GPUs each hardware context has its own translation table, called
+Per-Process Graphics Translation Table (PPGTT). Of important note, is
+that although PPGTT is named per-process it is actually per hardware
+context. When user space submits a batchbuffer, the kernel walks the
+list of GEM buffer objects used by the batchbuffer and guarantees that
+not only is the memory of each such GEM buffer object resident but it
+is also present in the (PP)GTT. If the GEM buffer object is not yet
+placed in the (PP)GTT, then it is given an address. Two consequences
+of this are: the kernel needs to edit the batchbuffer submitted to
+write the correct value of the GPU address when a GEM BO is assigned a
+GPU address and the kernel might evict a different GEM BO from the
+(PP)GTT to make address room for a GEM BO.
+
+Consequently, the ioctls submitting a batchbuffer for execution also
+include a list of all locations within buffers that refer to
+GPU-addresses so that the kernel can edit the buffer correctly. This
+process is dubbed relocation. The ioctls allow user space to provide to
+the kernel a presumed offset for each GEM buffer object used in a
+batchbuffer. If the kernel sees that the address provided by user space
+is correct, then it skips performing relocation for that GEM buffer
+object. In addition, the kernel provides to what addresses the kernel
+relocates each GEM buffer object.
+
+There is also an interface for user space to directly specify the
+address location of GEM BO's, the feature soft-pinning and made active
+within an execbuffer2 ioctl with `EXEC_OBJECT_PINNED` bit up. If
+user-space also specifi

[Intel-gfx] [PATCH v3 2/5] i915: add documentation of what happens at the bottom of submitting a batchbuffer

2018-03-27 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.h | 33 +
 1 file changed, 33 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 1f50727..62d3a22 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -317,6 +317,39 @@ struct intel_engine_execlists {
 
 #define INTEL_ENGINE_CS_MAX_NAME 8
 
+/**
+ * DOC: Ringbuffers to submit batchbuffers
+ *
+ * At the lowest level, submitting work to a GPU engine is to add commands to
+ * a ringbuffer. A ringbuffer in the kernel driver is essentially a location
+ * from which the GPU reads its next command. To avoid copying the contents
+ * of a batchbuffer in order to submit it, the GPU has native hardware support
+ * to perform commands specified in another buffer; the command to do so is
+ * a batchbuffer start and the i915 kernel driver uses this to avoid copying
+ * batchbuffers to the ringbuffer. At the very bottom of the stack, the i915
+ * does the following to submit a batchbuffer to the GPU.
+ *
+ * 1. Add a command to invalidate caches to the ringbuffer
+ *
+ * 2. Add a batchbuffer start command to the ringbuffer. The start command is
+ *essentially a token together with the GPU address of the batchbuffer to
+ *be executed.
+ *
+ * 3. Add a pipeline flush to the the ringbuffer.
+ *
+ * 4. Add a register write command to the ring buffer.  This register write
+ *writes the the request ID, ``i915_request::global_seqno``; the i915
+ *kernel driver uses the value in the register to know what requests are
+ *completed.
+ *
+ * 5. Add a user interrupt command to the ringbuffer. This command instructs
+ *the GPU to issue an interrupt when the command (and pipeline flush) are
+ *completed.
+ *
+ * 6. Inform the hardware of the additional commands added to
+ *the ringbuffer.
+ */
+
 struct intel_engine_cs {
struct drm_i915_private *i915;
char name[INTEL_ENGINE_CS_MAX_NAME];
-- 
2.7.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 1/1] i915: additional GEM documentation

2018-03-02 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

This patch provides additional overview documentation to the
i915 kernel driver GEM. In addition, it presents already written
documentation to i915.rst as well.

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
---
 Documentation/gpu/i915.rst | 194 +++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   3 +-
 drivers/gpu/drm/i915/i915_vma.h|  11 +-
 drivers/gpu/drm/i915/intel_lrc.c   |   3 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h|  64 ++
 5 files changed, 235 insertions(+), 40 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 41dc881b00dc..cd23da2793ec 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -13,6 +13,18 @@ Core Driver Infrastructure
 This section covers core driver infrastructure used by both the display
 and the GEM parts of the driver.
 
+Initialization
+--
+
+The real action of initialization for the i915 driver is handled by
+:c:func:`i915_driver_load`; from this function one can see the key
+data (in paritcular :c:struct:'drm_driver' for GEM) of the entry points
+to to the driver from user space.
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.c
+   :functions: i915_driver_load
+
+
 Runtime Power Management
 
 
@@ -243,32 +255,148 @@ Display PLLs
 .. kernel-doc:: drivers/gpu/drm/i915/intel_dpll_mgr.h
:internal:
 
-Memory Management and Command Submission
-
+GEM: Memory Management and Command Submission
+=
 
 This sections covers all things related to the GEM implementation in the
 i915 driver.
 
-Batchbuffer Parsing

+Intel GPU Basics
+
 
-.. kernel-doc:: drivers/gpu/drm/i915/i915_cmd_parser.c
-   :doc: batch buffer command parser
+An Intel GPU has multiple engines. There are several engine types.
+The user-space value `I915_EXEC_DEFAULT` is an alias to the user
+space value `I915_EXEC_RENDER`.
+
+- RCS engine is for rendering 3D and performing compute, this is named 
`I915_EXEC_RENDER` in user space.
+- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
+- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in 
user space
+- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user 
space.
+
+The Intel GPU family is a familiy of integrated GPU's using Unified Memory
+Access. For having the GPU "do work", user space will feed the GPU batch 
buffers
+via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2` or
+`DRM_IOCTL_I915_GEM_EXECBUFFER2_WR` (the ioctl `DRM_IOCTL_I915_GEM_EXECBUFFER`
+is deprecated). Most such batchbuffers will instruct the GPU to perform work
+(for example rendering) and that work needs memory from which to read and 
memory
+to which to write. All memory is encapsulated within GEM buffer objects 
(usually
+created with the ioctl `DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a 
batchbuffer
+for the GPU to create will also list all GEM buffer objects that the 
batchbuffer
+reads and/or writes. For implementation details of memory management see
+`GEM BO Management Implementation Details`_.
+
+A GPU pipeline (mostly strongly so for the RCS engine) has a great deal of 
state
+which is to be programmed by user space via the contents of a batchbuffer. 
Starting
+in Gen6 (SandyBridge), hardware contexts are supported. A hardware context
+encapsulates GPU pipeline state and other portions of GPU state and it is much 
more
+efficient for the GPU to load a hardware context instead of re-submitting 
commands
+in a batchbuffer to the GPU to restore state. In addition, using hardware 
contexts
+provides much better isolation between user space clients. The ioctl
+`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` is used by user space to create a hardware 
context
+which is identified by a 32-bit integer. The non-deprecated ioctls to submit 
batchbuffer
+work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1) to
+identify what HW context to use with the command. When the kernel submits the
+batchbuffer to be executed by the GPU it will also instruct the GPU to load 
the HW
+context prior to executing the contents of a batchbuffer.
+
+The GPU has its own memory management and address space. The kernel driver
+maintains the memory translation table for the GPU. For older GPUs (i.e. those
+before Gen8), there is a single global such translation table, a global
+Graphics Translation Table (GTT). For newer generation GPUs each hardware
+context has its own translation table, called Per-Process Graphics Translation
+Table (PPGTT). Of important note, is that although PPGTT is named per-process 
it
+is actually per hardware context. When user space submits a batchbuffer, the 
kernel
+walks the list of GEM buffer objects used by the batchbuffer and gua

[Intel-gfx] [PATCH v2 0/1] Documentation patch for batchbuffer submission

2018-03-02 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

v2:
   More documentation: intel_ringbuffer, sequence number.
   Expose to i915.rst existing documentation

   Call out GEM_EXECBUFFER as deprecated.
   Place code detailed documentation in source files.
   Call out INTEL_EXEC_RENDER.
   Reorder text to make it more readable.
   Refer to Command Buffer Parser instead of Batchbuffer Parser.
   (suggested by Joonas Lahtinen)





Kevin Rogovin (1):
  i915: additional documentation

 Documentation/gpu/i915.rst | 194 +++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   3 +-
 drivers/gpu/drm/i915/i915_vma.h|  11 +-
 drivers/gpu/drm/i915/intel_lrc.c   |   3 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h|  64 ++
 5 files changed, 235 insertions(+), 40 deletions(-)

-- 
2.16.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 0/1 RFC] Documentation patch for batchbuffer submission

2018-02-16 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

This patch attempts to provide text for the kernel documentation on the
subject of batchbuffer submission. The contents of the patch are essentially:
 - explain the nature of (PP)GTT and relocation with respect to GPU executing
   batchbuffers (as narrative) and
 - add a little more text on the implementation of batchbuffer submission
   (a little text and also including to i915.rst the already existing text
   in i915_gem_execbuffer.c).

In coming weeks I want to also provide documentation on the rules for handling
the lists, the breadcumb jazz, locking rules to be followed in i915 and what
the IRQ's handled by i915 end up triggering with respect to GEM.

Comments (even flames) are welcome, especially for checking the truth to text
written. I freely confess that I am using the act of writing the documentation
to get myself deeply familiar with the way i915 GEM works.

Any requests to write documentation for how GEM works I will gladly take up
as well.

Kevin Rogovin (1):
  drivers/gpu/drm/i915:Documentation for batchbuffer submission

 Documentation/gpu/i915.rst | 109 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  10 +++
 2 files changed, 119 insertions(+)

-- 
2.16.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/1 RFC] drivers/gpu/drm/i915:Documentation for batchbuffer submission

2018-02-16 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

Signed-off-by: Kevin Rogovin <kevin.rogo...@intel.com>
---
 Documentation/gpu/i915.rst | 109 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  10 +++
 2 files changed, 119 insertions(+)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 41dc881b00dc..36b3ade85839 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -13,6 +13,18 @@ Core Driver Infrastructure
 This section covers core driver infrastructure used by both the display
 and the GEM parts of the driver.
 
+Initialization
+--
+
+The real action of initialization for the i915 driver is handled by
+:c:func:`i915_driver_load`; from this function one can see the key
+data (in paritcular :c:struct:'drm_driver' for GEM) of the entry points
+to to the driver from user space. 
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.c
+   :functions: i915_driver_load
+
+
 Runtime Power Management
 
 
@@ -249,6 +261,102 @@ Memory Management and Command Submission
 This sections covers all things related to the GEM implementation in the
 i915 driver.
 
+Intel GPU Basics
+
+
+An Intel GPU has multiple engines. There are several engine types.
+
+- RCS engine is for rendering 3D and performing compute, this is named 
`I915_EXEC_DEFAULT` in user space.
+- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
+- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in 
user space
+- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user 
space.
+
+The Intel GPU family is a familiy of integrated GPU's using Unified Memory
+Access. For having the GPU "do work", user space will feed the GPU batch 
buffers
+via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER`, 
`DRM_IOCTL_I915_GEM_EXECBUFFER2`
+or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will instruct 
the
+GPU to perform work (for example rendering) and that work needs memory from
+which to read and memory to which to write. All memory is encapsulated within
+GEM buffer objects (usually created with the ioctl DRM_IOCTL_I915_GEM_CREATE).
+An ioctl providing a batchbuffer for the GPU to create will also list all GEM
+buffer objects that the batchbuffer reads and/or writes.
+
+The GPU has its own memory management and address space. The kernel driver
+maintains the memory translation table for the GPU. For older GPUs (i.e. those
+before Gen8), there is a single global such translation table, a global
+Graphics Translation Table (GTT). For newer generation GPUs each hardware
+context has its own translation table, called Per-Process Graphics Translation
+Table (PPGTT). Of important note, is that although PPGTT is named per-process 
it
+is actually per hardware context. When user space submits a batchbuffer, the 
kernel
+walks the list of GEM buffer objects used by the batchbuffer and guarantees
+that not only is the memory of each such GEM buffer object resident but it is
+also present in the (PP)GTT. If the GEM buffer object is not yet placed in
+the (PP)GTT, then it is given an address. Two consequences of this are:
+the kernel needs to edit the batchbuffer submitted to write the correct
+value of the GPU address when a GEM BO is assigned a GPU address and
+the kernel might evict a different GEM BO from the (PP)GTT to make address
+room for a GEM BO.
+
+Consequently, the ioctls submitting a batchbuffer for execution also include
+a list of all locations within buffers that refer to GPU-addresses so that the
+kernel can edit the buffer correctly. This process is dubbed relocation. The
+ioctls allow user space to provide what the GPU address could be. If the kernel
+sees that the address provided by user space is correct, then it skips 
performing
+relocation for that GEM buffer object. In addition, the ioctl's provide to what
+addresses the kernel relocates each GEM buffer object.
+
+There is also an interface for user space to directly specify the address 
location
+of GEM BO's, the feature soft-pinning and made active within an execbuffer2
+ioctl with EXEC_OBJECT_PINNED bit up. If user-space also specifies 
I915_EXEC_NO_RELOC,
+then the kernel is to not execute any relocation and user-space manages the 
address
+space for its PPGTT itself. The advantage of user space handling address space 
is
+that then the kernel does far less work and user space can safely assume that
+GEM buffer object's location in GPU address space do not change.
+
+Starting in Gen6, Intel GPU's support hardware contexts. A GPU hardware context
+represents GPU state that can be saved and restored. When user space uses a 
hardware
+context, it does not need to restore the GPU state at the start of each 
batchbuffer
+because the kernel directly the GPU to load the state from the hardware 
context.
+Hardware contexts allow for much greater isolation between proce

[Intel-gfx] [PATCH 3/3] i965: implement (per-context) scratch page checking

2017-12-08 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

---
 src/mesa/drivers/dri/i965/brw_context.c   | 30 +++
 src/mesa/drivers/dri/i965/brw_context.h   | 10 +
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 15 +-
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 126c187f62..8e1afdc859 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -844,6 +844,7 @@ brwCreateContext(gl_api api,
 {
struct gl_context *shareCtx = (struct gl_context *) sharedContextPrivate;
struct intel_screen *screen = driContextPriv->driScreenPriv->driverPrivate;
+   __DRIscreen *dri_screen = screen->driScrnPriv;
const struct gen_device_info *devinfo = >devinfo;
struct dd_function_table functions;
 
@@ -1078,6 +1079,30 @@ brwCreateContext(gl_api api,
 
brw_disk_cache_init(brw);
 
+   brw->scratch.size = 0;
+   if (brw->hw_ctx && (INTEL_DEBUG & DEBUG_CHECK_SCRATCH_PAGE)) {
+  struct drm_i915_gem_context_param p;
+  int ret;
+
+  p.ctx_id = brw->hw_ctx;
+  p.size = 0;
+  p.param = I915_CONTEXT_PARAM_SCRATCH_PAGE_CONTENTS;
+  p.value = 0;
+  ret = drmIoctl(dri_screen->fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, );
+  if (ret == 0 && p.size > 0) {
+ brw->scratch.size = p.size;
+ brw->scratch.values = calloc(brw->scratch.size, 1);
+ brw->scratch.tmp = calloc(brw->scratch.size, 1);
+ for (uint64_t i = 0; i < brw->scratch.size; ++i) {
+brw->scratch.values[i] = rand() & 0xFF;
+ }
+ p.value = (__u64) (uintptr_t) brw->scratch.values;
+ ret = drmIoctl(dri_screen->fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, 
);
+ assert(ret == 0);
+ assert(p.size == brw->scratch.size);
+  }
+   }
+
return true;
 }
 
@@ -1140,6 +1165,11 @@ intelDestroyContext(__DRIcontext * driContextPriv)
 
driDestroyOptionCache(>optionCache);
 
+   if (brw->scratch.size > 0) {
+  free(brw->scratch.values);
+  free(brw->scratch.tmp);
+   }
+
/* free the Mesa context */
_mesa_free_context_data(>ctx);
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 0f0aad8534..e73eab9160 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1263,6 +1263,16 @@ struct brw_context
 */
bool draw_aux_buffer_disabled[MAX_DRAW_BUFFERS];
 
+   /* Checking the scratch page to detect out-of-bounds writes
+* by the GPU; a zero value on the scratch size indicates
+* that scratch page checking is not enabled.
+*/
+   struct {
+  uint8_t *values;
+  uint8_t *tmp;
+  uint64_t size;
+   } scratch;
+
__DRIcontext *driContext;
struct intel_screen *screen;
 };
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 91a6506a89..2e4e7747d0 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -936,9 +936,22 @@ _intel_batchbuffer_flush_fence(struct brw_context *brw,
 
ret = submit_batch(brw, in_fence_fd, out_fence_fd);
 
-   if (unlikely(INTEL_DEBUG & DEBUG_SYNC)) {
+   if (unlikely((INTEL_DEBUG & DEBUG_SYNC) || brw->scratch.size > 0)) {
   fprintf(stderr, "waiting for idle\n");
   brw_bo_wait_rendering(brw->batch.batch.bo);
+  if (brw->scratch.size > 0) {
+ struct drm_i915_gem_context_param p;
+ int ret;
+
+ p.ctx_id = brw->hw_ctx;
+ p.size = brw->scratch.size;
+ p.param = I915_CONTEXT_PARAM_SCRATCH_PAGE_CONTENTS;
+ p.value = (__u64) (uintptr_t) brw->scratch.tmp;
+ ret = drmIoctl(brw->screen->driScrnPriv->fd, 
DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, );
+ assert(ret == 0);
+ assert(p.size == brw->scratch.size);
+ assert(memcmp(brw->scratch.tmp, brw->scratch.values, 
brw->scratch.size) == 0);
+  }
}
 
/* Start a new batch buffer. */
-- 
2.15.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/3] drm-uapi:i915 define set/get (per-context) scratch page ioctl interface

2017-12-08 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

---
 include/drm-uapi/i915_drm.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h
index 7f28eea403..e75eea058b 100644
--- a/include/drm-uapi/i915_drm.h
+++ b/include/drm-uapi/i915_drm.h
@@ -1412,6 +1412,15 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY   1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY0
 #define   I915_CONTEXT_MIN_USER_PRIORITY   -1023 /* inclusive */
+/* Notes for set/get of I915_CONTEXT_PARAM_SCRATCH_PAGE_CONTENTS:
+ *  1) for the duration of set/get, the caller must guarantee
+ * that nothing can read/write the scratch page
+ *  2) on set and get, ioctl will write to drm_i915_gem_context_param::set
+ * the size of the scratch page
+ *  3) on get, first drm_i915_gem_context_param::size of the scratch
+ * page contents will be written to.
+ */
+#define I915_CONTEXT_PARAM_SCRATCH_PAGE_CONTENTS 0x7
__u64 value;
 };
 
-- 
2.15.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 2/3] i965: define gen_debug for checking (per-context) scratch page

2017-12-08 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

---
 src/intel/common/gen_debug.c | 1 +
 src/intel/common/gen_debug.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/intel/common/gen_debug.c b/src/intel/common/gen_debug.c
index a978f2f581..b8bb8b1395 100644
--- a/src/intel/common/gen_debug.c
+++ b/src/intel/common/gen_debug.c
@@ -85,6 +85,7 @@ static const struct debug_control debug_control[] = {
{ "nohiz",   DEBUG_NO_HIZ },
{ "color",   DEBUG_COLOR },
{ "reemit",  DEBUG_REEMIT },
+   { "check_scratch", DEBUG_CHECK_SCRATCH_PAGE },
{ NULL,0 }
 };
 
diff --git a/src/intel/common/gen_debug.h b/src/intel/common/gen_debug.h
index da5b5a569d..c1532d951e 100644
--- a/src/intel/common/gen_debug.h
+++ b/src/intel/common/gen_debug.h
@@ -83,6 +83,7 @@ extern uint64_t INTEL_DEBUG;
 #define DEBUG_NO_HIZ  (1ull << 39)
 #define DEBUG_COLOR   (1ull << 40)
 #define DEBUG_REEMIT  (1ull << 41)
+#define DEBUG_CHECK_SCRATCH_PAGE  (1ull << 42)
 
 #ifdef HAVE_ANDROID_PLATFORM
 #define LOG_TAG "INTEL-MESA"
-- 
2.15.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 0/3] RFC: v2 set/get scratch page contents

2017-12-08 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

This patch series for Mesa defines and uses a new ioctl interface
to get and set the contents of the scratch page for a PPGTT.
The purpose of checking that the scratch page is not changed is
to help detect out-of-bound buffer object writes by the GPU.

v2:
 - Correctly change to using GEM_CONTEXT_GET/SETPARAM and thus
   make scratch page get/set as per-context
 - place checking of scratch page in same location if Mesa/i965
   as syncing

Kevin Rogovin (3):
  drm-uapi:i915 define set/get (per-context) scratch page ioctl
interface
  i965: define gen_debug for checking (per-context) scratch page
  i965: implement (per-context) scratch page checking

 include/drm-uapi/i915_drm.h   |  9 
 src/intel/common/gen_debug.c  |  1 +
 src/intel/common/gen_debug.h  |  1 +
 src/mesa/drivers/dri/i965/brw_context.c   | 30 +++
 src/mesa/drivers/dri/i965/brw_context.h   | 10 +
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 15 +-
 6 files changed, 65 insertions(+), 1 deletion(-)

-- 
2.15.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 0/3] RFC: Scratch page checking

2017-12-04 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

This patch series proposes a new kernel interface for user
space to read and write the values of the scratch page for
a PPGTT. The user space is expected to guarantee (via its
own locking mechanism) that nothing shall read or write to
the scratch page for the duration of the ioctl's to read
or write the scratch page values. The purpose for user space
to read and write the scratch page values is to help see if
an out-of-bound write was done to the scratch page by the
GPU; from the point of view of GL, this can be used to help
detect if an application performs an out-of-bounds write to
an SSBO.

Patch 1 defines the kernel interface
Patches 2-3 implement the debug option in i965 to check for
inadvertent scratch page writes by the GPU.

Kevin Rogovin (3):
  drm-uapi: define interface to kernel for scratch page read
  i965: define stuff for scratch page checking in intel_screen
  i965: check scratch page in a locked fashion on each ioctl

 include/drm-uapi/i915_drm.h   | 31 +++
 src/intel/common/gen_debug.c  |  1 +
 src/intel/common/gen_debug.h  |  1 +
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 27 ++-
 src/mesa/drivers/dri/i965/intel_screen.c  | 26 ++
 src/mesa/drivers/dri/i965/intel_screen.h  | 12 +++
 6 files changed, 97 insertions(+), 1 deletion(-)

-- 
2.15.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 2/3] i965: define stuff for scratch page checking in intel_screen

2017-12-04 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

---
 src/intel/common/gen_debug.c |  1 +
 src/intel/common/gen_debug.h |  1 +
 src/mesa/drivers/dri/i965/intel_screen.c | 26 ++
 src/mesa/drivers/dri/i965/intel_screen.h | 12 
 4 files changed, 40 insertions(+)

diff --git a/src/intel/common/gen_debug.c b/src/intel/common/gen_debug.c
index f58c593c44..7bd6723311 100644
--- a/src/intel/common/gen_debug.c
+++ b/src/intel/common/gen_debug.c
@@ -84,6 +84,7 @@ static const struct debug_control debug_control[] = {
{ "nohiz",   DEBUG_NO_HIZ },
{ "color",   DEBUG_COLOR },
{ "reemit",  DEBUG_REEMIT },
+   { "check_scratch", DEBUG_CHECK_SCRATH },
{ NULL,0 }
 };
 
diff --git a/src/intel/common/gen_debug.h b/src/intel/common/gen_debug.h
index e418e3fb16..5e224a45f0 100644
--- a/src/intel/common/gen_debug.h
+++ b/src/intel/common/gen_debug.h
@@ -83,6 +83,7 @@ extern uint64_t INTEL_DEBUG;
 #define DEBUG_NO_HIZ  (1ull << 39)
 #define DEBUG_COLOR   (1ull << 40)
 #define DEBUG_REEMIT  (1ull << 41)
+#define DEBUG_CHECK_SCRATH(1ull << 42)
 
 #ifdef HAVE_ANDROID_PLATFORM
 #define LOG_TAG "INTEL-MESA"
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 38769babf0..044be8fe85 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1557,6 +1557,12 @@ intelDestroyScreen(__DRIscreen * sPriv)
brw_bufmgr_destroy(screen->bufmgr);
driDestroyOptionInfo(>optionCache);
 
+   if (screen->debug_batchbuffer.enabled) {
+  simple_mtx_destroy(>debug_batchbuffer.mutex);
+  free(screen->debug_batchbuffer.noise_values);
+  free(screen->debug_batchbuffer.tmp);
+   }
+
ralloc_free(screen);
sPriv->driverPrivate = NULL;
 }
@@ -2610,6 +2616,26 @@ __DRIconfig **intelInitScreen2(__DRIscreen *dri_screen)
   }
}
 
+   screen->debug_batchbuffer.enabled = false;
+   if (INTEL_DEBUG & DEBUG_CHECK_SCRATH) {
+  struct drm_i915_scratch_page sc;
+  int err;
+
+  sc.buffer_size = 0;
+  sc.buffer_ptr = 0;
+  err = drmIoctl(dri_screen->fd, DRM_IOCTL_I915_READ_SCRATCH_PAGE, );
+  if (err == 0) {
+ screen->debug_batchbuffer.enabled = true;
+ simple_mtx_init(>debug_batchbuffer.mutex, mtx_plain);
+ screen->debug_batchbuffer.buffer_size = sc.buffer_size;
+ screen->debug_batchbuffer.noise_values = 
calloc(screen->debug_batchbuffer.buffer_size, 1);
+ screen->debug_batchbuffer.tmp = 
calloc(screen->debug_batchbuffer.buffer_size, 1);
+ for (uint64_t i = 0; i < screen->debug_batchbuffer.buffer_size; ++i) {
+screen->debug_batchbuffer.noise_values[i] = rand() & 0xFF;
+ }
+  }
+   }
+
return (const __DRIconfig**) intel_screen_make_configs(dri_screen);
 }
 
diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
b/src/mesa/drivers/dri/i965/intel_screen.h
index 7948617b7f..7d56106aa2 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.h
+++ b/src/mesa/drivers/dri/i965/intel_screen.h
@@ -37,6 +37,7 @@
 #include "common/gen_device_info.h"
 #include "i915_drm.h"
 #include "util/xmlconfig.h"
+#include "util/simple_mtx.h"
 
 #include "isl/isl.h"
 
@@ -114,6 +115,17 @@ struct intel_screen
 */
int eu_total;
 
+   /**
+* Struct to perform out-of-bound GEM BO write checking
+*/
+   struct {
+  bool enabled;
+  simple_mtx_t mutex;
+  uint32_t buffer_size;
+  uint8_t *noise_values;
+  uint8_t *tmp;
+   } debug_batchbuffer;
+
bool mesa_format_supports_texture[MESA_FORMAT_COUNT];
bool mesa_format_supports_render[MESA_FORMAT_COUNT];
enum isl_format mesa_to_isl_render_format[MESA_FORMAT_COUNT];
-- 
2.15.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/3] drm-uapi: define interface to kernel for scratch page read

2017-12-04 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

---
 include/drm-uapi/i915_drm.h | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/include/drm-uapi/i915_drm.h b/include/drm-uapi/i915_drm.h
index 890df227ae..3a9c3a2d0c 100644
--- a/include/drm-uapi/i915_drm.h
+++ b/include/drm-uapi/i915_drm.h
@@ -262,6 +262,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_PERF_OPEN 0x36
 #define DRM_I915_PERF_ADD_CONFIG   0x37
 #define DRM_I915_PERF_REMOVE_CONFIG0x38
+#define DRM_I915_READ_SCRATCH_PAGE  0x39
+#define DRM_I915_WRITE_SCRATCH_PAGE 0x40
 
 #define DRM_IOCTL_I915_INITDRM_IOW( DRM_COMMAND_BASE + 
DRM_I915_INIT, drm_i915_init_t)
 #define DRM_IOCTL_I915_FLUSH   DRM_IO ( DRM_COMMAND_BASE + 
DRM_I915_FLUSH)
@@ -319,6 +321,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_PERF_OPEN   DRM_IOW(DRM_COMMAND_BASE + 
DRM_I915_PERF_OPEN, struct drm_i915_perf_open_param)
 #define DRM_IOCTL_I915_PERF_ADD_CONFIG DRM_IOW(DRM_COMMAND_BASE + 
DRM_I915_PERF_ADD_CONFIG, struct drm_i915_perf_oa_config)
 #define DRM_IOCTL_I915_PERF_REMOVE_CONFIG  DRM_IOW(DRM_COMMAND_BASE + 
DRM_I915_PERF_REMOVE_CONFIG, __u64)
+#define DRM_IOCTL_I915_READ_SCRATCH_PAGEDRM_IOWR (DRM_COMMAND_BASE + 
DRM_I915_READ_SCRATCH_PAGE, struct drm_i915_scratch_page)
+#define DRM_IOCTL_I915_WRITE_SCRATCH_PAGE   DRM_IOWR (DRM_COMMAND_BASE + 
DRM_I915_WRITE_SCRATCH_PAGE, struct drm_i915_scratch_page)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1535,6 +1539,33 @@ struct drm_i915_perf_oa_config {
__u64 flex_regs_ptr;
 };
 
+/**
+ * Structure to read/write scratch page of PPGTT. Read and writing
+ * values are not reliable unless the calling application guarantees
+ * that no batchbuffer that could read or write the scratch is in
+ * flight using the PPGTT between the time the ioctl is issued and
+ * it returns.
+ */
+struct drm_i915_scratch_page {
+   /**
+* size in bytes of the backing store pointed to by buffer_ptr;
+* kernel will return the actual size of the scratch page in
+* this field as well.
+*/
+   __u32 buffer_size;
+
+   /**
+* Pointer data with which to upload to or download from the
+* scratch page; if the buffer size behind buffer_ptr is
+* smaller than the scratch page size, then only the first
+* buffer_size bytes are read or written. If the scratch
+* page size is greater than buffer_size, then the bytes
+* past the scratch page size in buffer behind bufer_ptr
+* are not read or writte.
+*/
+   __u64 buffer_ptr;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.15.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 3/3] i965: check scratch page in a locked fashion on each ioctl

2017-12-04 Thread kevin . rogovin
From: Kevin Rogovin <kevin.rogo...@intel.com>

---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 216073129b..53b3eaf49b 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -804,7 +804,8 @@ static int
 submit_batch(struct brw_context *brw, int in_fence_fd, int *out_fence_fd)
 {
const struct gen_device_info *devinfo = >screen->devinfo;
-   __DRIscreen *dri_screen = brw->screen->driScrnPriv;
+   struct intel_screen *screen = brw->screen;
+   __DRIscreen *dri_screen = screen->driScrnPriv;
struct intel_batchbuffer *batch = >batch;
int ret = 0;
 
@@ -875,10 +876,34 @@ submit_batch(struct brw_context *brw, int in_fence_fd, 
int *out_fence_fd)
  batch->validation_list[index] = tmp;
   }
 
+  if (unlikely(screen->debug_batchbuffer.enabled)) {
+ simple_mtx_lock(>debug_batchbuffer.mutex);
+  }
+
   ret = execbuffer(dri_screen->fd, batch, hw_ctx,
4 * USED_BATCH(*batch),
in_fence_fd, out_fence_fd, flags);
 
+  if (unlikely(screen->debug_batchbuffer.enabled)) {
+ struct drm_i915_scratch_page sc;
+ int ret;
+
+ while (brw_bo_busy(batch->bo)) {
+usleep(10);
+ }
+
+ sc.buffer_size = screen->debug_batchbuffer.buffer_size;
+ sc.buffer_ptr = (__u64)(uintptr_t) screen->debug_batchbuffer.tmp;
+
+ ret = drmIoctl(dri_screen->fd, DRM_IOCTL_I915_READ_SCRATCH_PAGE, );
+ assert(ret == 0);
+ assert(sc.buffer_size == screen->debug_batchbuffer.buffer_size);
+ assert(memcmp(screen->debug_batchbuffer.tmp,
+   screen->debug_batchbuffer.noise_values,
+   screen->debug_batchbuffer.buffer_size) == 0);
+ simple_mtx_unlock(>debug_batchbuffer.mutex);
+  }
+
   throttle(brw);
}
 
-- 
2.15.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx