Re: [PATCH] drm/i915/huc: fix leak of debug object in huc load fence on driver unload

2022-11-28 Thread Ceraolo Spurio, Daniele




On 11/28/2022 5:08 AM, Ville Syrjälä wrote:

On Mon, Nov 28, 2022 at 01:10:58AM -0800, Ceraolo Spurio, Daniele wrote:


On 11/25/2022 5:54 AM, Ville Syrjälä wrote:

On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:

The fence is always initialized in huc_init_early, but the cleanup in
huc_fini is only being run if HuC is enabled. This causes a leaking of
the debug object when HuC is disabled/not supported, which can in turn
trigger a warning if we try to register a new debug offset at the same
address on driver reload.

To fix the issue, make sure to always run the cleanup code.

This oopsing in ci now. Somehow the patchwork run did not
hit that oops.

Can you point me to the oops log? I opened a few recent runs at random
but I wasn't able to find it.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12425/fi-blb-e6850/igt@core_hotunp...@unbind-rebind.html


Thanks, it's indeed the same issue (and I've just confirmed that the 
pre-merge result for the fix do mention that this test is moving from 
incomplete to pass). From just a visual inspection I thought the problem 
would only affect MTL, which does have HuC but only on one of the 2 GTs, 
but it looks like this impacts also platforms without HuC at all (as 
long as they also have no VCS engines). I'll try to get the fix reviewed 
and merged ASAP.


Thanks,
Daniele




Re: [PATCH] drm/i915/huc: fix leak of debug object in huc load fence on driver unload

2022-11-28 Thread Ville Syrjälä
On Mon, Nov 28, 2022 at 01:10:58AM -0800, Ceraolo Spurio, Daniele wrote:
> 
> 
> On 11/25/2022 5:54 AM, Ville Syrjälä wrote:
> > On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:
> >> The fence is always initialized in huc_init_early, but the cleanup in
> >> huc_fini is only being run if HuC is enabled. This causes a leaking of
> >> the debug object when HuC is disabled/not supported, which can in turn
> >> trigger a warning if we try to register a new debug offset at the same
> >> address on driver reload.
> >>
> >> To fix the issue, make sure to always run the cleanup code.
> > This oopsing in ci now. Somehow the patchwork run did not
> > hit that oops.
> 
> Can you point me to the oops log? I opened a few recent runs at random 
> but I wasn't able to find it.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12425/fi-blb-e6850/igt@core_hotunp...@unbind-rebind.html

-- 
Ville Syrjälä
Intel


Re: [PATCH] drm/i915/huc: fix leak of debug object in huc load fence on driver unload

2022-11-28 Thread Ceraolo Spurio, Daniele




On 11/25/2022 5:54 AM, Ville Syrjälä wrote:

On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:

The fence is always initialized in huc_init_early, but the cleanup in
huc_fini is only being run if HuC is enabled. This causes a leaking of
the debug object when HuC is disabled/not supported, which can in turn
trigger a warning if we try to register a new debug offset at the same
address on driver reload.

To fix the issue, make sure to always run the cleanup code.

This oopsing in ci now. Somehow the patchwork run did not
hit that oops.


Can you point me to the oops log? I opened a few recent runs at random 
but I wasn't able to find it.
Note that I did spot a potential issue that hits platforms that don't 
have VCS engines (introduced due to a MTL change to support HuC only on 
the media GT) and I already have a fix for that on the ML:


https://patchwork.freedesktop.org/series/111288/

But without looking at the oops logs or knowing on which platform it was 
on I don't know if it's the same issue or not.


Daniele




Reported-by: Tvrtko Ursulin 
Reported-by: Brian Norris 
Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
Signed-off-by: Daniele Ceraolo Spurio 
Cc: Tvrtko Ursulin 
Cc: Brian Norris 
Cc: Alan Previn 
Cc: John Harrison 
---

Note: I didn't manage to repro the reported warning, but I did confirm
that we weren't correctly calling i915_sw_fence_fini and that this patch
fixes that.

  drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++-
  drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
  2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
index fbc8bae14f76..83735a1528fe 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
@@ -300,13 +300,15 @@ int intel_huc_init(struct intel_huc *huc)
  
  void intel_huc_fini(struct intel_huc *huc)

  {
-   if (!intel_uc_fw_is_loadable(>fw))
-   return;
-
+   /*
+* the fence is initialized in init_early, so we need to clean it up
+* even if HuC loading is off.
+*/
delayed_huc_load_complete(huc);
-
i915_sw_fence_fini(>delayed_load.fence);
-   intel_uc_fw_fini(>fw);
+
+   if (intel_uc_fw_is_loadable(>fw))
+   intel_uc_fw_fini(>fw);
  }
  
  void intel_huc_suspend(struct intel_huc *huc)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index dbd048b77e19..41f08b55790e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -718,6 +718,7 @@ int intel_uc_runtime_resume(struct intel_uc *uc)
  
  static const struct intel_uc_ops uc_ops_off = {

.init_hw = __uc_check_hw,
+   .fini = __uc_fini, /* to clean-up the init_early initialization */
  };
  
  static const struct intel_uc_ops uc_ops_on = {

--
2.37.3




Re: [PATCH] drm/i915/huc: fix leak of debug object in huc load fence on driver unload

2022-11-25 Thread Ville Syrjälä
On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:
> The fence is always initialized in huc_init_early, but the cleanup in
> huc_fini is only being run if HuC is enabled. This causes a leaking of
> the debug object when HuC is disabled/not supported, which can in turn
> trigger a warning if we try to register a new debug offset at the same
> address on driver reload.
> 
> To fix the issue, make sure to always run the cleanup code.

This oopsing in ci now. Somehow the patchwork run did not
hit that oops.

> 
> Reported-by: Tvrtko Ursulin 
> Reported-by: Brian Norris 
> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
> Signed-off-by: Daniele Ceraolo Spurio 
> Cc: Tvrtko Ursulin 
> Cc: Brian Norris 
> Cc: Alan Previn 
> Cc: John Harrison 
> ---
> 
> Note: I didn't manage to repro the reported warning, but I did confirm
> that we weren't correctly calling i915_sw_fence_fini and that this patch
> fixes that.
> 
>  drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++-
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> index fbc8bae14f76..83735a1528fe 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> @@ -300,13 +300,15 @@ int intel_huc_init(struct intel_huc *huc)
>  
>  void intel_huc_fini(struct intel_huc *huc)
>  {
> - if (!intel_uc_fw_is_loadable(>fw))
> - return;
> -
> + /*
> +  * the fence is initialized in init_early, so we need to clean it up
> +  * even if HuC loading is off.
> +  */
>   delayed_huc_load_complete(huc);
> -
>   i915_sw_fence_fini(>delayed_load.fence);
> - intel_uc_fw_fini(>fw);
> +
> + if (intel_uc_fw_is_loadable(>fw))
> + intel_uc_fw_fini(>fw);
>  }
>  
>  void intel_huc_suspend(struct intel_huc *huc)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index dbd048b77e19..41f08b55790e 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -718,6 +718,7 @@ int intel_uc_runtime_resume(struct intel_uc *uc)
>  
>  static const struct intel_uc_ops uc_ops_off = {
>   .init_hw = __uc_check_hw,
> + .fini = __uc_fini, /* to clean-up the init_early initialization */
>  };
>  
>  static const struct intel_uc_ops uc_ops_on = {
> -- 
> 2.37.3

-- 
Ville Syrjälä
Intel


Re: [PATCH] drm/i915/huc: fix leak of debug object in huc load fence on driver unload

2022-11-22 Thread John Harrison

On 11/10/2022 16:56, Daniele Ceraolo Spurio wrote:

The fence is always initialized in huc_init_early, but the cleanup in
huc_fini is only being run if HuC is enabled. This causes a leaking of
the debug object when HuC is disabled/not supported, which can in turn
trigger a warning if we try to register a new debug offset at the same
address on driver reload.

To fix the issue, make sure to always run the cleanup code.

Reported-by: Tvrtko Ursulin 
Reported-by: Brian Norris 
Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
Signed-off-by: Daniele Ceraolo Spurio 
Cc: Tvrtko Ursulin 
Cc: Brian Norris 
Cc: Alan Previn 
Cc: John Harrison 

Reviewed-by: John Harrison 


---

Note: I didn't manage to repro the reported warning, but I did confirm
that we weren't correctly calling i915_sw_fence_fini and that this patch
fixes that.

  drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++-
  drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
  2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
index fbc8bae14f76..83735a1528fe 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
@@ -300,13 +300,15 @@ int intel_huc_init(struct intel_huc *huc)
  
  void intel_huc_fini(struct intel_huc *huc)

  {
-   if (!intel_uc_fw_is_loadable(>fw))
-   return;
-
+   /*
+* the fence is initialized in init_early, so we need to clean it up
+* even if HuC loading is off.
+*/
delayed_huc_load_complete(huc);
-
i915_sw_fence_fini(>delayed_load.fence);
-   intel_uc_fw_fini(>fw);
+
+   if (intel_uc_fw_is_loadable(>fw))
+   intel_uc_fw_fini(>fw);
  }
  
  void intel_huc_suspend(struct intel_huc *huc)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index dbd048b77e19..41f08b55790e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -718,6 +718,7 @@ int intel_uc_runtime_resume(struct intel_uc *uc)
  
  static const struct intel_uc_ops uc_ops_off = {

.init_hw = __uc_check_hw,
+   .fini = __uc_fini, /* to clean-up the init_early initialization */
  };
  
  static const struct intel_uc_ops uc_ops_on = {




Re: [PATCH] drm/i915/huc: fix leak of debug object in huc load fence on driver unload

2022-11-17 Thread Ceraolo Spurio, Daniele




On 11/16/2022 5:29 PM, Brian Norris wrote:

Hi Daniele,

On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:

The fence is always initialized in huc_init_early, but the cleanup in
huc_fini is only being run if HuC is enabled. This causes a leaking of
the debug object when HuC is disabled/not supported, which can in turn
trigger a warning if we try to register a new debug offset at the same
address on driver reload.

To fix the issue, make sure to always run the cleanup code.

Reported-by: Tvrtko Ursulin 
Reported-by: Brian Norris 
Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
Signed-off-by: Daniele Ceraolo Spurio 
Cc: Tvrtko Ursulin 
Cc: Brian Norris 
Cc: Alan Previn 
Cc: John Harrison 
---

Note: I didn't manage to repro the reported warning, but I did confirm
that we weren't correctly calling i915_sw_fence_fini and that this patch
fixes that.

I *did* reproduce, and with this patch, I no longer reproduce. So:

Tested-by: Brian Norris 

I see this differs very slightly from the draft version (which didn't
work for me):

https://lore.kernel.org/all/ac5fde11-c17d-8574-c938-c2278d53c...@intel.com/

so presumably that diff is the fix.


The extra diff makes the driver call the cleanup function even if HuC is 
disabled, while the draft version just fixed the cleanup function 
without making sure it was being called.




Thanks a bunch!


Thanks for testing!

Daniele



Brian


  drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++-
  drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
  2 files changed, 8 insertions(+), 5 deletions(-)




Re: [PATCH] drm/i915/huc: fix leak of debug object in huc load fence on driver unload

2022-11-16 Thread Brian Norris
Hi Daniele,

On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:
> The fence is always initialized in huc_init_early, but the cleanup in
> huc_fini is only being run if HuC is enabled. This causes a leaking of
> the debug object when HuC is disabled/not supported, which can in turn
> trigger a warning if we try to register a new debug offset at the same
> address on driver reload.
> 
> To fix the issue, make sure to always run the cleanup code.
> 
> Reported-by: Tvrtko Ursulin 
> Reported-by: Brian Norris 
> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
> Signed-off-by: Daniele Ceraolo Spurio 
> Cc: Tvrtko Ursulin 
> Cc: Brian Norris 
> Cc: Alan Previn 
> Cc: John Harrison 
> ---
> 
> Note: I didn't manage to repro the reported warning, but I did confirm
> that we weren't correctly calling i915_sw_fence_fini and that this patch
> fixes that.

I *did* reproduce, and with this patch, I no longer reproduce. So:

Tested-by: Brian Norris 

I see this differs very slightly from the draft version (which didn't
work for me):

https://lore.kernel.org/all/ac5fde11-c17d-8574-c938-c2278d53c...@intel.com/

so presumably that diff is the fix.

Thanks a bunch!

Brian

>  drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++-
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
>  2 files changed, 8 insertions(+), 5 deletions(-)


[PATCH] drm/i915/huc: fix leak of debug object in huc load fence on driver unload

2022-11-10 Thread Daniele Ceraolo Spurio
The fence is always initialized in huc_init_early, but the cleanup in
huc_fini is only being run if HuC is enabled. This causes a leaking of
the debug object when HuC is disabled/not supported, which can in turn
trigger a warning if we try to register a new debug offset at the same
address on driver reload.

To fix the issue, make sure to always run the cleanup code.

Reported-by: Tvrtko Ursulin 
Reported-by: Brian Norris 
Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
Signed-off-by: Daniele Ceraolo Spurio 
Cc: Tvrtko Ursulin 
Cc: Brian Norris 
Cc: Alan Previn 
Cc: John Harrison 
---

Note: I didn't manage to repro the reported warning, but I did confirm
that we weren't correctly calling i915_sw_fence_fini and that this patch
fixes that.

 drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
index fbc8bae14f76..83735a1528fe 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
@@ -300,13 +300,15 @@ int intel_huc_init(struct intel_huc *huc)
 
 void intel_huc_fini(struct intel_huc *huc)
 {
-   if (!intel_uc_fw_is_loadable(>fw))
-   return;
-
+   /*
+* the fence is initialized in init_early, so we need to clean it up
+* even if HuC loading is off.
+*/
delayed_huc_load_complete(huc);
-
i915_sw_fence_fini(>delayed_load.fence);
-   intel_uc_fw_fini(>fw);
+
+   if (intel_uc_fw_is_loadable(>fw))
+   intel_uc_fw_fini(>fw);
 }
 
 void intel_huc_suspend(struct intel_huc *huc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index dbd048b77e19..41f08b55790e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -718,6 +718,7 @@ int intel_uc_runtime_resume(struct intel_uc *uc)
 
 static const struct intel_uc_ops uc_ops_off = {
.init_hw = __uc_check_hw,
+   .fini = __uc_fini, /* to clean-up the init_early initialization */
 };
 
 static const struct intel_uc_ops uc_ops_on = {
-- 
2.37.3