Re: [Intel-gfx] [PATCH v2] drm/i915/uc: Start preparing GuC/HuC for reset

2018-03-01 Thread Daniele Ceraolo Spurio



On 26/02/18 23:50, Chris Wilson wrote:

Quoting Sagar Arun Kamble (2018-02-27 06:54:46)



On 2/27/2018 2:22 AM, Chris Wilson wrote:

Quoting Daniele Ceraolo Spurio (2018-02-26 16:57:11)

As you said we do always reset GuC no matter the value of the modparam,
but that does not reset the doorbell HW. If we're coming out of S3 and
the state as been preserved this will cause the doorbell HW to be left
in an unclean state, which could cause spurious doorbell interrupts to
be sent to GuC, not sure how the firmware handles those. The code as
moved since last time I looked at this in detail and I think we're now
most likely going to overwrite those unclean doorbells, but there are
unlikely corner cases (preempt context failing to be created) where we
might not do so.

I'm still going "wait, we can put the device into D3 and the GuC is
still powered?" Something feels wrong if the GuC retains state after the
HW is powered down.

GuC will be powered down, with RC6. Just that firmware in WOPCM can get
wiped off if
memory is reset/powered down during resume. In case of mem sleep
generally WOPCM stays intact and if we exit
RC6 on resume from sleep, firmware will be restored into GuC without
driver intervention.
But since we do full GPU reset as part of sanitize we have to load it
from driver again.


On resume, we don't know if we are coming from module load, resume from
S3, resume S3+RST, resume from S4, or resume from kexec. (S3+RST, kexec
are truly without our knowledge, the others we could feed the
information through but RST makes that moot.) Ergo, you cannot know if
the right fw image is loaded and aiui you should treat the state as
undefined and always reload. Does that make sense? Is there a way you
can query what fw is loaded and so skip instead?
-Chris



Not sure if there is already a way to query the FW version, but we could 
ask for a new H2G to be added if there isn't. However, even if the 
firmware version matches, if we can't confirm it is exactly the one we 
loaded (and not a reload of the same version) there is no guarantee that 
its internal state is what we expect. Also, we would have to stop doing 
full gpu resets around suspend/resume which I doubt it is something we want.


Daniele

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2] drm/i915/uc: Start preparing GuC/HuC for reset

2018-02-26 Thread Chris Wilson
Quoting Sagar Arun Kamble (2018-02-27 06:54:46)
> 
> 
> On 2/27/2018 2:22 AM, Chris Wilson wrote:
> > Quoting Daniele Ceraolo Spurio (2018-02-26 16:57:11)
> >> As you said we do always reset GuC no matter the value of the modparam,
> >> but that does not reset the doorbell HW. If we're coming out of S3 and
> >> the state as been preserved this will cause the doorbell HW to be left
> >> in an unclean state, which could cause spurious doorbell interrupts to
> >> be sent to GuC, not sure how the firmware handles those. The code as
> >> moved since last time I looked at this in detail and I think we're now
> >> most likely going to overwrite those unclean doorbells, but there are
> >> unlikely corner cases (preempt context failing to be created) where we
> >> might not do so.
> > I'm still going "wait, we can put the device into D3 and the GuC is
> > still powered?" Something feels wrong if the GuC retains state after the
> > HW is powered down.
> GuC will be powered down, with RC6. Just that firmware in WOPCM can get 
> wiped off if
> memory is reset/powered down during resume. In case of mem sleep 
> generally WOPCM stays intact and if we exit
> RC6 on resume from sleep, firmware will be restored into GuC without 
> driver intervention.
> But since we do full GPU reset as part of sanitize we have to load it 
> from driver again.

On resume, we don't know if we are coming from module load, resume from
S3, resume S3+RST, resume from S4, or resume from kexec. (S3+RST, kexec
are truly without our knowledge, the others we could feed the
information through but RST makes that moot.) Ergo, you cannot know if
the right fw image is loaded and aiui you should treat the state as
undefined and always reload. Does that make sense? Is there a way you
can query what fw is loaded and so skip instead?
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2] drm/i915/uc: Start preparing GuC/HuC for reset

2018-02-26 Thread Sagar Arun Kamble



On 2/26/2018 10:27 PM, Daniele Ceraolo Spurio wrote:



On 25/02/18 22:17, Sagar Arun Kamble wrote:



On 2/23/2018 10:31 PM, Daniele Ceraolo Spurio wrote:



On 23/02/18 06:04, Michal Wajdeczko wrote:
Right after GPU reset there will be a small window of time during 
which

some of GuC/HuC fields will still show state before reset. Let's start
to fix that by sanitizing firmware status as we will use it shortly.

v2: s/reset_prepare/prepare_to_reset (Michel)
 don't forget about gem_sanitize path (Daniele)

Suggested-by: Daniele Ceraolo Spurio 
Signed-off-by: Michal Wajdeczko 
Cc: Daniele Ceraolo Spurio 
Cc: Sagar Arun Kamble 
Cc: Chris Wilson 
Cc: Michel Thierry 
---
  drivers/gpu/drm/i915/i915_gem.c    |  5 -
  drivers/gpu/drm/i915/intel_guc.h   |  5 +
  drivers/gpu/drm/i915/intel_huc.h   |  5 +
  drivers/gpu/drm/i915/intel_uc.c    | 14 ++
  drivers/gpu/drm/i915/intel_uc.h    |  1 +
  drivers/gpu/drm/i915/intel_uc_fw.h |  6 ++
  6 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c 
b/drivers/gpu/drm/i915/i915_gem.c

index 14c855b..ae2c4ba 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2981,6 +2981,7 @@ int i915_gem_reset_prepare(struct 
drm_i915_private *dev_priv)

  }
    i915_gem_revoke_fences(dev_priv);
+    intel_uc_prepare_to_reset(dev_priv);
    return err;
  }
@@ -4881,8 +4882,10 @@ void i915_gem_sanitize(struct 
drm_i915_private *i915)
   * it may impact the display and we are uncertain about the 
stability

   * of the reset, so this could be applied to even earlier gen.
   */
-    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
+    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) {
+    intel_uc_prepare_to_reset(i915);


This leaves the status with an incorrect value if we boot with 
i915.reset=0, 
It depends on whether WOPCM is locked (In case of resume from S3 I 
have seen it to be the case often).

Then we need not reload GuC also unless we are not doing full GPU reset.
but I think this is still the right place to add this in. 

Yes
There are several things with GuC that are going to break if we use 
reset=0 (e.g. doorbell cleanup) 

Can you elaborate how it might break.
i915 isn't currently communicating to GuC (destroy_doorbell) during 
doorbell cleanup and if we start communicating then it should
not fail as GuC will be available with reset=0.  Also 
__intel_uc_reset_hw isn't gated by reset modparam.


As you said we do always reset GuC no matter the value of the 
modparam, but that does not reset the doorbell HW. If we're coming out 
of S3 and the state as been preserved this will cause the doorbell HW 
to be left in an unclean state, which could cause spurious doorbell 
interrupts to be sent to GuC, not sure how the firmware handles those. 
The code as moved since last time I looked at this in detail and I 
think we're now most likely going to overwrite those unclean 
doorbells, but there are unlikely corner cases (preempt context 
failing to be created) where we might not do so.
More generally, my concern was that in the code flow we assume GuC and 
related HW to be reset and in need of a re-init when we come out of 
suspend when actually as you reported that might not be the case if we 
have reset=0. Even if we have no major concerns now, issues might 
arise in the future after code reworks or new feature additions if we 
start from a wrong assumption. Instead of changing the flow to 
consider the reset=0 (which isn't really a supported scenario) I think 
it'd be more useful to just enforce the fact that we don't support 
that use-case with GuC, hence my suggestion. And yes, I'm probably 
just being uber-paranoid :P



Makes sense . Agree on sanitizing with GuC to now allow reset=0
We could also fix this if we could reset doorbell unit alone at resume 
and acquire needed doorbells but AFAIK earlier guc_init_doorbell_hw is 
the way to reset all doorbells (that needed GuC). As you said we can 
skip these changes though since reset=0 isn't supported scenario.

Daniele

so I wouldn't consider this a regression, but we might want to start 
sanitizing the modparams to not allow reset=0 with GuC.


Reviewed-by: Daniele Ceraolo Spurio 

Daniele


WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
+    }
  }
    int i915_gem_suspend(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/intel_guc.h 
b/drivers/gpu/drm/i915/intel_guc.h

index 52856a9..0f6adb1 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -132,4 +132,9 @@ static inline u32 guc_ggtt_offset(struct 
i915_vma *vma)
  struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, 
u32 size);

  u32 

Re: [Intel-gfx] [PATCH v2] drm/i915/uc: Start preparing GuC/HuC for reset

2018-02-26 Thread Sagar Arun Kamble



On 2/27/2018 2:22 AM, Chris Wilson wrote:

Quoting Daniele Ceraolo Spurio (2018-02-26 16:57:11)


On 25/02/18 22:17, Sagar Arun Kamble wrote:


On 2/23/2018 10:31 PM, Daniele Ceraolo Spurio wrote:


On 23/02/18 06:04, Michal Wajdeczko wrote:

Right after GPU reset there will be a small window of time during which
some of GuC/HuC fields will still show state before reset. Let's start
to fix that by sanitizing firmware status as we will use it shortly.

v2: s/reset_prepare/prepare_to_reset (Michel)
  don't forget about gem_sanitize path (Daniele)

Suggested-by: Daniele Ceraolo Spurio 
Signed-off-by: Michal Wajdeczko 
Cc: Daniele Ceraolo Spurio 
Cc: Sagar Arun Kamble 
Cc: Chris Wilson 
Cc: Michel Thierry 
---
   drivers/gpu/drm/i915/i915_gem.c    |  5 -
   drivers/gpu/drm/i915/intel_guc.h   |  5 +
   drivers/gpu/drm/i915/intel_huc.h   |  5 +
   drivers/gpu/drm/i915/intel_uc.c    | 14 ++
   drivers/gpu/drm/i915/intel_uc.h    |  1 +
   drivers/gpu/drm/i915/intel_uc_fw.h |  6 ++
   6 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c
b/drivers/gpu/drm/i915/i915_gem.c
index 14c855b..ae2c4ba 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2981,6 +2981,7 @@ int i915_gem_reset_prepare(struct
drm_i915_private *dev_priv)
   }
     i915_gem_revoke_fences(dev_priv);
+    intel_uc_prepare_to_reset(dev_priv);
     return err;
   }
@@ -4881,8 +4882,10 @@ void i915_gem_sanitize(struct drm_i915_private
*i915)
    * it may impact the display and we are uncertain about the
stability
    * of the reset, so this could be applied to even earlier gen.
    */
-    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
+    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) {
+    intel_uc_prepare_to_reset(i915);

This leaves the status with an incorrect value if we boot with
i915.reset=0,

It depends on whether WOPCM is locked (In case of resume from S3 I have
seen it to be the case often).
Then we need not reload GuC also unless we are not doing full GPU reset.

but I think this is still the right place to add this in.

Yes

There are several things with GuC that are going to break if we use
reset=0 (e.g. doorbell cleanup)

Can you elaborate how it might break.
i915 isn't currently communicating to GuC (destroy_doorbell) during
doorbell cleanup and if we start communicating then it should
not fail as GuC will be available with reset=0.  Also
__intel_uc_reset_hw isn't gated by reset modparam.

As you said we do always reset GuC no matter the value of the modparam,
but that does not reset the doorbell HW. If we're coming out of S3 and
the state as been preserved this will cause the doorbell HW to be left
in an unclean state, which could cause spurious doorbell interrupts to
be sent to GuC, not sure how the firmware handles those. The code as
moved since last time I looked at this in detail and I think we're now
most likely going to overwrite those unclean doorbells, but there are
unlikely corner cases (preempt context failing to be created) where we
might not do so.

I'm still going "wait, we can put the device into D3 and the GuC is
still powered?" Something feels wrong if the GuC retains state after the
HW is powered down.
GuC will be powered down, with RC6. Just that firmware in WOPCM can get 
wiped off if
memory is reset/powered down during resume. In case of mem sleep 
generally WOPCM stays intact and if we exit
RC6 on resume from sleep, firmware will be restored into GuC without 
driver intervention.
But since we do full GPU reset as part of sanitize we have to load it 
from driver again.

  (So I'm wondering why this isn't just part of the
normal guc init path for module load/resume.)
-Chris


--
Thanks,
Sagar

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2] drm/i915/uc: Start preparing GuC/HuC for reset

2018-02-26 Thread Chris Wilson
Quoting Daniele Ceraolo Spurio (2018-02-26 16:57:11)
> 
> 
> On 25/02/18 22:17, Sagar Arun Kamble wrote:
> > 
> > 
> > On 2/23/2018 10:31 PM, Daniele Ceraolo Spurio wrote:
> >>
> >>
> >> On 23/02/18 06:04, Michal Wajdeczko wrote:
> >>> Right after GPU reset there will be a small window of time during which
> >>> some of GuC/HuC fields will still show state before reset. Let's start
> >>> to fix that by sanitizing firmware status as we will use it shortly.
> >>>
> >>> v2: s/reset_prepare/prepare_to_reset (Michel)
> >>>  don't forget about gem_sanitize path (Daniele)
> >>>
> >>> Suggested-by: Daniele Ceraolo Spurio 
> >>> Signed-off-by: Michal Wajdeczko 
> >>> Cc: Daniele Ceraolo Spurio 
> >>> Cc: Sagar Arun Kamble 
> >>> Cc: Chris Wilson 
> >>> Cc: Michel Thierry 
> >>> ---
> >>>   drivers/gpu/drm/i915/i915_gem.c    |  5 -
> >>>   drivers/gpu/drm/i915/intel_guc.h   |  5 +
> >>>   drivers/gpu/drm/i915/intel_huc.h   |  5 +
> >>>   drivers/gpu/drm/i915/intel_uc.c    | 14 ++
> >>>   drivers/gpu/drm/i915/intel_uc.h    |  1 +
> >>>   drivers/gpu/drm/i915/intel_uc_fw.h |  6 ++
> >>>   6 files changed, 35 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> >>> b/drivers/gpu/drm/i915/i915_gem.c
> >>> index 14c855b..ae2c4ba 100644
> >>> --- a/drivers/gpu/drm/i915/i915_gem.c
> >>> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >>> @@ -2981,6 +2981,7 @@ int i915_gem_reset_prepare(struct 
> >>> drm_i915_private *dev_priv)
> >>>   }
> >>>     i915_gem_revoke_fences(dev_priv);
> >>> +    intel_uc_prepare_to_reset(dev_priv);
> >>>     return err;
> >>>   }
> >>> @@ -4881,8 +4882,10 @@ void i915_gem_sanitize(struct drm_i915_private 
> >>> *i915)
> >>>    * it may impact the display and we are uncertain about the 
> >>> stability
> >>>    * of the reset, so this could be applied to even earlier gen.
> >>>    */
> >>> -    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
> >>> +    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) {
> >>> +    intel_uc_prepare_to_reset(i915);
> >>
> >> This leaves the status with an incorrect value if we boot with 
> >> i915.reset=0, 
> > It depends on whether WOPCM is locked (In case of resume from S3 I have 
> > seen it to be the case often).
> > Then we need not reload GuC also unless we are not doing full GPU reset.
> >> but I think this is still the right place to add this in. 
> > Yes
> >> There are several things with GuC that are going to break if we use 
> >> reset=0 (e.g. doorbell cleanup) 
> > Can you elaborate how it might break.
> > i915 isn't currently communicating to GuC (destroy_doorbell) during 
> > doorbell cleanup and if we start communicating then it should
> > not fail as GuC will be available with reset=0.  Also 
> > __intel_uc_reset_hw isn't gated by reset modparam.
> 
> As you said we do always reset GuC no matter the value of the modparam, 
> but that does not reset the doorbell HW. If we're coming out of S3 and 
> the state as been preserved this will cause the doorbell HW to be left 
> in an unclean state, which could cause spurious doorbell interrupts to 
> be sent to GuC, not sure how the firmware handles those. The code as 
> moved since last time I looked at this in detail and I think we're now 
> most likely going to overwrite those unclean doorbells, but there are 
> unlikely corner cases (preempt context failing to be created) where we 
> might not do so.

I'm still going "wait, we can put the device into D3 and the GuC is
still powered?" Something feels wrong if the GuC retains state after the
HW is powered down. (So I'm wondering why this isn't just part of the
normal guc init path for module load/resume.)
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2] drm/i915/uc: Start preparing GuC/HuC for reset

2018-02-26 Thread Daniele Ceraolo Spurio



On 25/02/18 22:17, Sagar Arun Kamble wrote:



On 2/23/2018 10:31 PM, Daniele Ceraolo Spurio wrote:



On 23/02/18 06:04, Michal Wajdeczko wrote:

Right after GPU reset there will be a small window of time during which
some of GuC/HuC fields will still show state before reset. Let's start
to fix that by sanitizing firmware status as we will use it shortly.

v2: s/reset_prepare/prepare_to_reset (Michel)
 don't forget about gem_sanitize path (Daniele)

Suggested-by: Daniele Ceraolo Spurio 
Signed-off-by: Michal Wajdeczko 
Cc: Daniele Ceraolo Spurio 
Cc: Sagar Arun Kamble 
Cc: Chris Wilson 
Cc: Michel Thierry 
---
  drivers/gpu/drm/i915/i915_gem.c    |  5 -
  drivers/gpu/drm/i915/intel_guc.h   |  5 +
  drivers/gpu/drm/i915/intel_huc.h   |  5 +
  drivers/gpu/drm/i915/intel_uc.c    | 14 ++
  drivers/gpu/drm/i915/intel_uc.h    |  1 +
  drivers/gpu/drm/i915/intel_uc_fw.h |  6 ++
  6 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c 
b/drivers/gpu/drm/i915/i915_gem.c

index 14c855b..ae2c4ba 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2981,6 +2981,7 @@ int i915_gem_reset_prepare(struct 
drm_i915_private *dev_priv)

  }
    i915_gem_revoke_fences(dev_priv);
+    intel_uc_prepare_to_reset(dev_priv);
    return err;
  }
@@ -4881,8 +4882,10 @@ void i915_gem_sanitize(struct drm_i915_private 
*i915)
   * it may impact the display and we are uncertain about the 
stability

   * of the reset, so this could be applied to even earlier gen.
   */
-    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
+    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) {
+    intel_uc_prepare_to_reset(i915);


This leaves the status with an incorrect value if we boot with 
i915.reset=0, 
It depends on whether WOPCM is locked (In case of resume from S3 I have 
seen it to be the case often).

Then we need not reload GuC also unless we are not doing full GPU reset.
but I think this is still the right place to add this in. 

Yes
There are several things with GuC that are going to break if we use 
reset=0 (e.g. doorbell cleanup) 

Can you elaborate how it might break.
i915 isn't currently communicating to GuC (destroy_doorbell) during 
doorbell cleanup and if we start communicating then it should
not fail as GuC will be available with reset=0.  Also 
__intel_uc_reset_hw isn't gated by reset modparam.


As you said we do always reset GuC no matter the value of the modparam, 
but that does not reset the doorbell HW. If we're coming out of S3 and 
the state as been preserved this will cause the doorbell HW to be left 
in an unclean state, which could cause spurious doorbell interrupts to 
be sent to GuC, not sure how the firmware handles those. The code as 
moved since last time I looked at this in detail and I think we're now 
most likely going to overwrite those unclean doorbells, but there are 
unlikely corner cases (preempt context failing to be created) where we 
might not do so.
More generally, my concern was that in the code flow we assume GuC and 
related HW to be reset and in need of a re-init when we come out of 
suspend when actually as you reported that might not be the case if we 
have reset=0. Even if we have no major concerns now, issues might arise 
in the future after code reworks or new feature additions if we start 
from a wrong assumption. Instead of changing the flow to consider the 
reset=0 (which isn't really a supported scenario) I think it'd be more 
useful to just enforce the fact that we don't support that use-case with 
GuC, hence my suggestion. And yes, I'm probably just being uber-paranoid :P


Daniele

so I wouldn't consider this a regression, but we might want to start 
sanitizing the modparams to not allow reset=0 with GuC.


Reviewed-by: Daniele Ceraolo Spurio 

Daniele


  WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
+    }
  }
    int i915_gem_suspend(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/intel_guc.h 
b/drivers/gpu/drm/i915/intel_guc.h

index 52856a9..0f6adb1 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -132,4 +132,9 @@ static inline u32 guc_ggtt_offset(struct i915_vma 
*vma)
  struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 
size);

  u32 intel_guc_wopcm_size(struct drm_i915_private *dev_priv);
  +static inline void intel_guc_prepare_to_reset(struct intel_guc *guc)
+{
+    intel_uc_fw_prepare_to_reset(>fw);
+}
+
  #endif
diff --git a/drivers/gpu/drm/i915/intel_huc.h 
b/drivers/gpu/drm/i915/intel_huc.h

index 40039db..96e24f9 100644
--- a/drivers/gpu/drm/i915/intel_huc.h
+++ b/drivers/gpu/drm/i915/intel_huc.h
@@ -38,4 +38,9 @@ struct intel_huc {
  int 

Re: [Intel-gfx] [PATCH v2] drm/i915/uc: Start preparing GuC/HuC for reset

2018-02-25 Thread Sagar Arun Kamble



On 2/23/2018 10:31 PM, Daniele Ceraolo Spurio wrote:



On 23/02/18 06:04, Michal Wajdeczko wrote:

Right after GPU reset there will be a small window of time during which
some of GuC/HuC fields will still show state before reset. Let's start
to fix that by sanitizing firmware status as we will use it shortly.

v2: s/reset_prepare/prepare_to_reset (Michel)
 don't forget about gem_sanitize path (Daniele)

Suggested-by: Daniele Ceraolo Spurio 
Signed-off-by: Michal Wajdeczko 
Cc: Daniele Ceraolo Spurio 
Cc: Sagar Arun Kamble 
Cc: Chris Wilson 
Cc: Michel Thierry 
---
  drivers/gpu/drm/i915/i915_gem.c    |  5 -
  drivers/gpu/drm/i915/intel_guc.h   |  5 +
  drivers/gpu/drm/i915/intel_huc.h   |  5 +
  drivers/gpu/drm/i915/intel_uc.c    | 14 ++
  drivers/gpu/drm/i915/intel_uc.h    |  1 +
  drivers/gpu/drm/i915/intel_uc_fw.h |  6 ++
  6 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c 
b/drivers/gpu/drm/i915/i915_gem.c

index 14c855b..ae2c4ba 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2981,6 +2981,7 @@ int i915_gem_reset_prepare(struct 
drm_i915_private *dev_priv)

  }
    i915_gem_revoke_fences(dev_priv);
+    intel_uc_prepare_to_reset(dev_priv);
    return err;
  }
@@ -4881,8 +4882,10 @@ void i915_gem_sanitize(struct drm_i915_private 
*i915)
   * it may impact the display and we are uncertain about the 
stability

   * of the reset, so this could be applied to even earlier gen.
   */
-    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
+    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) {
+    intel_uc_prepare_to_reset(i915);


This leaves the status with an incorrect value if we boot with 
i915.reset=0, 
It depends on whether WOPCM is locked (In case of resume from S3 I have 
seen it to be the case often).

Then we need not reload GuC also unless we are not doing full GPU reset.
but I think this is still the right place to add this in. 

Yes
There are several things with GuC that are going to break if we use 
reset=0 (e.g. doorbell cleanup) 

Can you elaborate how it might break.
i915 isn't currently communicating to GuC (destroy_doorbell) during 
doorbell cleanup and if we start communicating then it should
not fail as GuC will be available with reset=0.  Also 
__intel_uc_reset_hw isn't gated by reset modparam.
so I wouldn't consider this a regression, but we might want to start 
sanitizing the modparams to not allow reset=0 with GuC.


Reviewed-by: Daniele Ceraolo Spurio 

Daniele


  WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
+    }
  }
    int i915_gem_suspend(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/intel_guc.h 
b/drivers/gpu/drm/i915/intel_guc.h

index 52856a9..0f6adb1 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -132,4 +132,9 @@ static inline u32 guc_ggtt_offset(struct i915_vma 
*vma)
  struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 
size);

  u32 intel_guc_wopcm_size(struct drm_i915_private *dev_priv);
  +static inline void intel_guc_prepare_to_reset(struct intel_guc *guc)
+{
+    intel_uc_fw_prepare_to_reset(>fw);
+}
+
  #endif
diff --git a/drivers/gpu/drm/i915/intel_huc.h 
b/drivers/gpu/drm/i915/intel_huc.h

index 40039db..96e24f9 100644
--- a/drivers/gpu/drm/i915/intel_huc.h
+++ b/drivers/gpu/drm/i915/intel_huc.h
@@ -38,4 +38,9 @@ struct intel_huc {
  int intel_huc_init_hw(struct intel_huc *huc);
  int intel_huc_auth(struct intel_huc *huc);
  +static inline void intel_huc_prepare_to_reset(struct intel_huc *huc)
+{
+    intel_uc_fw_prepare_to_reset(>fw);
+}
+
  #endif
diff --git a/drivers/gpu/drm/i915/intel_uc.c 
b/drivers/gpu/drm/i915/intel_uc.c

index 9f1bac6..8042d4b 100644
--- a/drivers/gpu/drm/i915/intel_uc.c
+++ b/drivers/gpu/drm/i915/intel_uc.c
@@ -445,3 +445,17 @@ void intel_uc_fini_hw(struct drm_i915_private 
*dev_priv)

  if (USES_GUC_SUBMISSION(dev_priv))
  gen9_disable_guc_interrupts(dev_priv);
  }
+
+void intel_uc_prepare_to_reset(struct drm_i915_private *i915)
+{
+    struct intel_huc *huc = >huc;
+    struct intel_guc *guc = >guc;
+
+    if (!USES_GUC(i915))
+    return;
+
+    GEM_BUG_ON(!HAS_GUC(i915));
+
+    intel_huc_prepare_to_reset(huc);
+    intel_guc_prepare_to_reset(guc);
+}
diff --git a/drivers/gpu/drm/i915/intel_uc.h 
b/drivers/gpu/drm/i915/intel_uc.h

index f2984e0..7a8ae58 100644
--- a/drivers/gpu/drm/i915/intel_uc.h
+++ b/drivers/gpu/drm/i915/intel_uc.h
@@ -39,6 +39,7 @@
  void intel_uc_fini_hw(struct drm_i915_private *dev_priv);
  int intel_uc_init(struct drm_i915_private *dev_priv);
  void intel_uc_fini(struct drm_i915_private *dev_priv);
+void intel_uc_prepare_to_reset(struct 

Re: [Intel-gfx] [PATCH v2] drm/i915/uc: Start preparing GuC/HuC for reset

2018-02-23 Thread Daniele Ceraolo Spurio



On 23/02/18 06:04, Michal Wajdeczko wrote:

Right after GPU reset there will be a small window of time during which
some of GuC/HuC fields will still show state before reset. Let's start
to fix that by sanitizing firmware status as we will use it shortly.

v2: s/reset_prepare/prepare_to_reset (Michel)
 don't forget about gem_sanitize path (Daniele)

Suggested-by: Daniele Ceraolo Spurio 
Signed-off-by: Michal Wajdeczko 
Cc: Daniele Ceraolo Spurio 
Cc: Sagar Arun Kamble 
Cc: Chris Wilson 
Cc: Michel Thierry 
---
  drivers/gpu/drm/i915/i915_gem.c|  5 -
  drivers/gpu/drm/i915/intel_guc.h   |  5 +
  drivers/gpu/drm/i915/intel_huc.h   |  5 +
  drivers/gpu/drm/i915/intel_uc.c| 14 ++
  drivers/gpu/drm/i915/intel_uc.h|  1 +
  drivers/gpu/drm/i915/intel_uc_fw.h |  6 ++
  6 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 14c855b..ae2c4ba 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2981,6 +2981,7 @@ int i915_gem_reset_prepare(struct drm_i915_private 
*dev_priv)
}
  
  	i915_gem_revoke_fences(dev_priv);

+   intel_uc_prepare_to_reset(dev_priv);
  
  	return err;

  }
@@ -4881,8 +4882,10 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 * it may impact the display and we are uncertain about the stability
 * of the reset, so this could be applied to even earlier gen.
 */
-   if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
+   if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) {
+   intel_uc_prepare_to_reset(i915);


This leaves the status with an incorrect value if we boot with 
i915.reset=0, but I think this is still the right place to add this in. 
There are several things with GuC that are going to break if we use 
reset=0 (e.g. doorbell cleanup) so I wouldn't consider this a 
regression, but we might want to start sanitizing the modparams to not 
allow reset=0 with GuC.


Reviewed-by: Daniele Ceraolo Spurio 

Daniele


WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
+   }
  }
  
  int i915_gem_suspend(struct drm_i915_private *dev_priv)

diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 52856a9..0f6adb1 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -132,4 +132,9 @@ static inline u32 guc_ggtt_offset(struct i915_vma *vma)
  struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size);
  u32 intel_guc_wopcm_size(struct drm_i915_private *dev_priv);
  
+static inline void intel_guc_prepare_to_reset(struct intel_guc *guc)

+{
+   intel_uc_fw_prepare_to_reset(>fw);
+}
+
  #endif
diff --git a/drivers/gpu/drm/i915/intel_huc.h b/drivers/gpu/drm/i915/intel_huc.h
index 40039db..96e24f9 100644
--- a/drivers/gpu/drm/i915/intel_huc.h
+++ b/drivers/gpu/drm/i915/intel_huc.h
@@ -38,4 +38,9 @@ struct intel_huc {
  int intel_huc_init_hw(struct intel_huc *huc);
  int intel_huc_auth(struct intel_huc *huc);
  
+static inline void intel_huc_prepare_to_reset(struct intel_huc *huc)

+{
+   intel_uc_fw_prepare_to_reset(>fw);
+}
+
  #endif
diff --git a/drivers/gpu/drm/i915/intel_uc.c b/drivers/gpu/drm/i915/intel_uc.c
index 9f1bac6..8042d4b 100644
--- a/drivers/gpu/drm/i915/intel_uc.c
+++ b/drivers/gpu/drm/i915/intel_uc.c
@@ -445,3 +445,17 @@ void intel_uc_fini_hw(struct drm_i915_private *dev_priv)
if (USES_GUC_SUBMISSION(dev_priv))
gen9_disable_guc_interrupts(dev_priv);
  }
+
+void intel_uc_prepare_to_reset(struct drm_i915_private *i915)
+{
+   struct intel_huc *huc = >huc;
+   struct intel_guc *guc = >guc;
+
+   if (!USES_GUC(i915))
+   return;
+
+   GEM_BUG_ON(!HAS_GUC(i915));
+
+   intel_huc_prepare_to_reset(huc);
+   intel_guc_prepare_to_reset(guc);
+}
diff --git a/drivers/gpu/drm/i915/intel_uc.h b/drivers/gpu/drm/i915/intel_uc.h
index f2984e0..7a8ae58 100644
--- a/drivers/gpu/drm/i915/intel_uc.h
+++ b/drivers/gpu/drm/i915/intel_uc.h
@@ -39,6 +39,7 @@
  void intel_uc_fini_hw(struct drm_i915_private *dev_priv);
  int intel_uc_init(struct drm_i915_private *dev_priv);
  void intel_uc_fini(struct drm_i915_private *dev_priv);
+void intel_uc_prepare_to_reset(struct drm_i915_private *dev_priv);
  
  static inline bool intel_uc_is_using_guc(void)

  {
diff --git a/drivers/gpu/drm/i915/intel_uc_fw.h 
b/drivers/gpu/drm/i915/intel_uc_fw.h
index d5fd460..f1ee653 100644
--- a/drivers/gpu/drm/i915/intel_uc_fw.h
+++ b/drivers/gpu/drm/i915/intel_uc_fw.h
@@ -115,6 +115,12 @@ static inline bool intel_uc_fw_is_selected(struct 
intel_uc_fw *uc_fw)
return uc_fw->path != NULL;
  }
  
+static inline void