Re: [Intel-gfx] [RFC] drm/i915: check that rpm ref is held when writing to ringbuf in stolen mem

2016-01-28 Thread Dave Gordon

On 27/01/16 13:50, Chris Wilson wrote:

On Wed, Jan 27, 2016 at 01:13:54PM +, Daniele Ceraolo Spurio wrote:



On 27/01/16 09:38, Chris Wilson wrote:

On Wed, Jan 27, 2016 at 08:55:40AM +, daniele.ceraolospu...@intel.com wrote:

From: Daniele Ceraolo Spurio 

While running some tests on the scheduler patches with rpm enabled I
came across a corruption in the ringbuffer, which was root-caused to
the GPU being suspended while commands were being emitted to the
ringbuffer. The access to memory was failing because the GPU needs to
be awake when accessing stolen memory (where my ringbuffer was located).
Since we have this constraint it looks like a sensible idea to check that
we hold a refcount when we emit commands.

Cc: John Harrison 
Signed-off-by: Daniele Ceraolo Spurio 
---
  drivers/gpu/drm/i915/intel_lrc.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3761eaf..f9e8d74 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1105,6 +1105,11 @@ int intel_logical_ring_begin(struct drm_i915_gem_request 
*req, int num_dwords)
if (ret)
return ret;
+   // If the ringbuffer is in stolen memory we need to be sure that the
+   // gpu is awake before writing to it
+   if (req->ringbuf->obj->stolen && num_dwords > 0)
+   assert_rpm_wakelock_held(dev_priv);

The assertion you want is that when iomapping through the GTT that we
hold a wakeref.
-Chris


If I'm not missing anything, we iomap the ringbuffer at request
allocation time;


Strictly, the ring is pinned for whilst we access it for writing the
request i.e. only during request constuction. It can be unpinned at any
point afterwards. It is unpinned late today to paper over a few other
issues with context pinning and the overhead of having to do the iomap.


however, with the scheduler a request could
potentially wait in the queue for a time long enough to allow RPM to
kick in, especially if the request is waiting on a fence object
coming from a different driver. In this situation the rpm reference
taken to cover the request allocation would have already been
released and so we need to ensure that a new one has been taken
before writing to the ringbuffer; that's why I originally placed the
assert in ring_begin.


No, once the request is queued we are not modifying the ring. If the
scheduler needs to manipulate it (which it shouldn't) then it has to
acquire its own pin for its access (or extend the original pinning to
suit which would also be less than ideal).
-Chris


Different sense of "queued"; Daniele meant "queued in the scheduler, not 
yet written to the ringbuffer or known to the hardware", whereas Chris 
presumably means "queued in the ringbuffer, already visible to the 
hardware".


We may yet decide to have the scheduler do the iomap later, after it's 
selected the request for dispatch and therefore just before we start 
writing into the ringbuffer; but at present it doesn't because error 
recovery is harder there.


.Dave.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: check that rpm ref is held when writing to ringbuf in stolen mem

2016-01-27 Thread Daniel Vetter
On Wed, Jan 27, 2016 at 01:50:17PM +, Chris Wilson wrote:
> On Wed, Jan 27, 2016 at 01:13:54PM +, Daniele Ceraolo Spurio wrote:
> > 
> > 
> > On 27/01/16 09:38, Chris Wilson wrote:
> > >On Wed, Jan 27, 2016 at 08:55:40AM +, daniele.ceraolospu...@intel.com 
> > >wrote:
> > >>From: Daniele Ceraolo Spurio 
> > >>
> > >>While running some tests on the scheduler patches with rpm enabled I
> > >>came across a corruption in the ringbuffer, which was root-caused to
> > >>the GPU being suspended while commands were being emitted to the
> > >>ringbuffer. The access to memory was failing because the GPU needs to
> > >>be awake when accessing stolen memory (where my ringbuffer was located).
> > >>Since we have this constraint it looks like a sensible idea to check that
> > >>we hold a refcount when we emit commands.
> > >>
> > >>Cc: John Harrison 
> > >>Signed-off-by: Daniele Ceraolo Spurio 
> > >>---
> > >>  drivers/gpu/drm/i915/intel_lrc.c | 5 +
> > >>  1 file changed, 5 insertions(+)
> > >>
> > >>diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
> > >>b/drivers/gpu/drm/i915/intel_lrc.c
> > >>index 3761eaf..f9e8d74 100644
> > >>--- a/drivers/gpu/drm/i915/intel_lrc.c
> > >>+++ b/drivers/gpu/drm/i915/intel_lrc.c
> > >>@@ -1105,6 +1105,11 @@ int intel_logical_ring_begin(struct 
> > >>drm_i915_gem_request *req, int num_dwords)
> > >>  if (ret)
> > >>  return ret;
> > >>+ // If the ringbuffer is in stolen memory we need to be sure that the
> > >>+ // gpu is awake before writing to it
> > >>+ if (req->ringbuf->obj->stolen && num_dwords > 0)
> > >>+ assert_rpm_wakelock_held(dev_priv);
> > >The assertion you want is that when iomapping through the GTT that we
> > >hold a wakeref.
> > >-Chris
> > 
> > If I'm not missing anything, we iomap the ringbuffer at request
> > allocation time;
> 
> Strictly, the ring is pinned for whilst we access it for writing the
> request i.e. only during request constuction. It can be unpinned at any
> point afterwards. It is unpinned late today to paper over a few other
> issues with context pinning and the overhead of having to do the iomap.
> 
> > however, with the scheduler a request could
> > potentially wait in the queue for a time long enough to allow RPM to
> > kick in, especially if the request is waiting on a fence object
> > coming from a different driver. In this situation the rpm reference
> > taken to cover the request allocation would have already been
> > released and so we need to ensure that a new one has been taken
> > before writing to the ringbuffer; that's why I originally placed the
> > assert in ring_begin.
> 
> No, once the request is queued we are not modifying the ring. If the
> scheduler needs to manipulate it (which it shouldn't) then it has to
> acquire its own pin for its access (or extend the original pinning to
> suit which would also be less than ideal).

Yeah, with execlist all the scheduling should happen at the context level.
Well, all the scheduling should always happen at the context level, but
with execlist the hw makes it much smoother. Scheduler touching the rings
in the ctx object sounds like a bug.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [RFC] drm/i915: check that rpm ref is held when writing to ringbuf in stolen mem

2016-01-27 Thread daniele . ceraolospurio
From: Daniele Ceraolo Spurio 

While running some tests on the scheduler patches with rpm enabled I
came across a corruption in the ringbuffer, which was root-caused to
the GPU being suspended while commands were being emitted to the
ringbuffer. The access to memory was failing because the GPU needs to
be awake when accessing stolen memory (where my ringbuffer was located).
Since we have this constraint it looks like a sensible idea to check that
we hold a refcount when we emit commands.

Cc: John Harrison 
Signed-off-by: Daniele Ceraolo Spurio 
---
 drivers/gpu/drm/i915/intel_lrc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3761eaf..f9e8d74 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1105,6 +1105,11 @@ int intel_logical_ring_begin(struct drm_i915_gem_request 
*req, int num_dwords)
if (ret)
return ret;
 
+   // If the ringbuffer is in stolen memory we need to be sure that the
+   // gpu is awake before writing to it
+   if (req->ringbuf->obj->stolen && num_dwords > 0)
+   assert_rpm_wakelock_held(dev_priv);
+
req->ringbuf->space -= num_dwords * sizeof(uint32_t);
return 0;
 }
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: check that rpm ref is held when writing to ringbuf in stolen mem

2016-01-27 Thread Chris Wilson
On Wed, Jan 27, 2016 at 08:55:40AM +, daniele.ceraolospu...@intel.com wrote:
> From: Daniele Ceraolo Spurio 
> 
> While running some tests on the scheduler patches with rpm enabled I
> came across a corruption in the ringbuffer, which was root-caused to
> the GPU being suspended while commands were being emitted to the
> ringbuffer. The access to memory was failing because the GPU needs to
> be awake when accessing stolen memory (where my ringbuffer was located).
> Since we have this constraint it looks like a sensible idea to check that
> we hold a refcount when we emit commands.
> 
> Cc: John Harrison 
> Signed-off-by: Daniele Ceraolo Spurio 
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
> b/drivers/gpu/drm/i915/intel_lrc.c
> index 3761eaf..f9e8d74 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1105,6 +1105,11 @@ int intel_logical_ring_begin(struct 
> drm_i915_gem_request *req, int num_dwords)
>   if (ret)
>   return ret;
>  
> + // If the ringbuffer is in stolen memory we need to be sure that the
> + // gpu is awake before writing to it
> + if (req->ringbuf->obj->stolen && num_dwords > 0)
> + assert_rpm_wakelock_held(dev_priv);

The assertion you want is that when iomapping through the GTT that we
hold a wakeref.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: check that rpm ref is held when writing to ringbuf in stolen mem

2016-01-27 Thread Chris Wilson
On Wed, Jan 27, 2016 at 01:13:54PM +, Daniele Ceraolo Spurio wrote:
> 
> 
> On 27/01/16 09:38, Chris Wilson wrote:
> >On Wed, Jan 27, 2016 at 08:55:40AM +, daniele.ceraolospu...@intel.com 
> >wrote:
> >>From: Daniele Ceraolo Spurio 
> >>
> >>While running some tests on the scheduler patches with rpm enabled I
> >>came across a corruption in the ringbuffer, which was root-caused to
> >>the GPU being suspended while commands were being emitted to the
> >>ringbuffer. The access to memory was failing because the GPU needs to
> >>be awake when accessing stolen memory (where my ringbuffer was located).
> >>Since we have this constraint it looks like a sensible idea to check that
> >>we hold a refcount when we emit commands.
> >>
> >>Cc: John Harrison 
> >>Signed-off-by: Daniele Ceraolo Spurio 
> >>---
> >>  drivers/gpu/drm/i915/intel_lrc.c | 5 +
> >>  1 file changed, 5 insertions(+)
> >>
> >>diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
> >>b/drivers/gpu/drm/i915/intel_lrc.c
> >>index 3761eaf..f9e8d74 100644
> >>--- a/drivers/gpu/drm/i915/intel_lrc.c
> >>+++ b/drivers/gpu/drm/i915/intel_lrc.c
> >>@@ -1105,6 +1105,11 @@ int intel_logical_ring_begin(struct 
> >>drm_i915_gem_request *req, int num_dwords)
> >>if (ret)
> >>return ret;
> >>+   // If the ringbuffer is in stolen memory we need to be sure that the
> >>+   // gpu is awake before writing to it
> >>+   if (req->ringbuf->obj->stolen && num_dwords > 0)
> >>+   assert_rpm_wakelock_held(dev_priv);
> >The assertion you want is that when iomapping through the GTT that we
> >hold a wakeref.
> >-Chris
> 
> If I'm not missing anything, we iomap the ringbuffer at request
> allocation time;

Strictly, the ring is pinned for whilst we access it for writing the
request i.e. only during request constuction. It can be unpinned at any
point afterwards. It is unpinned late today to paper over a few other
issues with context pinning and the overhead of having to do the iomap.

> however, with the scheduler a request could
> potentially wait in the queue for a time long enough to allow RPM to
> kick in, especially if the request is waiting on a fence object
> coming from a different driver. In this situation the rpm reference
> taken to cover the request allocation would have already been
> released and so we need to ensure that a new one has been taken
> before writing to the ringbuffer; that's why I originally placed the
> assert in ring_begin.

No, once the request is queued we are not modifying the ring. If the
scheduler needs to manipulate it (which it shouldn't) then it has to
acquire its own pin for its access (or extend the original pinning to
suit which would also be less than ideal).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] drm/i915: check that rpm ref is held when writing to ringbuf in stolen mem

2016-01-27 Thread Daniele Ceraolo Spurio



On 27/01/16 09:38, Chris Wilson wrote:

On Wed, Jan 27, 2016 at 08:55:40AM +, daniele.ceraolospu...@intel.com wrote:

From: Daniele Ceraolo Spurio 

While running some tests on the scheduler patches with rpm enabled I
came across a corruption in the ringbuffer, which was root-caused to
the GPU being suspended while commands were being emitted to the
ringbuffer. The access to memory was failing because the GPU needs to
be awake when accessing stolen memory (where my ringbuffer was located).
Since we have this constraint it looks like a sensible idea to check that
we hold a refcount when we emit commands.

Cc: John Harrison 
Signed-off-by: Daniele Ceraolo Spurio 
---
  drivers/gpu/drm/i915/intel_lrc.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3761eaf..f9e8d74 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1105,6 +1105,11 @@ int intel_logical_ring_begin(struct drm_i915_gem_request 
*req, int num_dwords)
if (ret)
return ret;
  
+	// If the ringbuffer is in stolen memory we need to be sure that the

+   // gpu is awake before writing to it
+   if (req->ringbuf->obj->stolen && num_dwords > 0)
+   assert_rpm_wakelock_held(dev_priv);

The assertion you want is that when iomapping through the GTT that we
hold a wakeref.
-Chris


If I'm not missing anything, we iomap the ringbuffer at request 
allocation time; however, with the scheduler a request could potentially 
wait in the queue for a time long enough to allow RPM to kick in, 
especially if the request is waiting on a fence object coming from a 
different driver. In this situation the rpm reference taken to cover the 
request allocation would have already been released and so we need to 
ensure that a new one has been taken before writing to the ringbuffer; 
that's why I originally placed the assert in ring_begin.
Scheduler code is still in review anyway and subjected to change, so I 
guess that until that reaches its final form there is no point in 
debating where to put a possible second assert :-)


I'll respin the patch with the assert at iomap time as you suggested.

Thanks,
Daniele
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx