Re: [PATCH v2] drm/msm: Switch ordering of runpm put vs devfreq_idle

2022-06-09 Thread Doug Anderson
Hi,

On Wed, Jun 8, 2022 at 9:13 AM Rob Clark  wrote:
>
> From: Rob Clark 
>
> I've seen a few crashes like:
>
> CPU: 0 PID: 216 Comm: A618-worker Tainted: GW 5.4.196 #7
> Hardware name: Google Wormdingler rev1+ INX panel board (DT)
> pstate: 20c9 (nzCv daif +PAN +UAO)
> pc : msm_readl+0x14/0x34
> lr : a6xx_gpu_busy+0x40/0x80
> sp : ffc011b93ad0
> x29: ffc011b93ad0 x28: ffe77cba3000
> x27: 0001 x26: ffe77bb4c4ac
> x25: ffa2f227dfa0 x24: ffa2f22aab28
> x23:  x22: ffa2f22bf020
> x21: ffa2f22bf000 x20: ffc011b93b10
> x19: ffc011bd4110 x18: 000e
> x17: 0004 x16: 000c
> x15: 01be3a969450 x14: 0400
> x13: 000101d6 x12: 3415
> x11: 0001 x10: 
> x9 : 0001 x8 : ffc011bd4000
> x7 :  x6 : 0007
> x5 : ffc01d8b38f0 x4 : 
> x3 :  x2 : 0002
> x1 :  x0 : ffc011bd4110
> Call trace:
>  msm_readl+0x14/0x34
>  a6xx_gpu_busy+0x40/0x80
>  msm_devfreq_get_dev_status+0x70/0x1d0
>  devfreq_simple_ondemand_func+0x34/0x100
>  update_devfreq+0x50/0xe8
>  qos_notifier_call+0x2c/0x64
>  qos_max_notifier_call+0x1c/0x2c
>  notifier_call_chain+0x58/0x98
>  __blocking_notifier_call_chain+0x74/0x84
>  blocking_notifier_call_chain+0x38/0x48
>  pm_qos_update_target+0xf8/0x19c
>  freq_qos_apply+0x54/0x6c
>  apply_constraint+0x60/0x104
>  __dev_pm_qos_update_request+0xb4/0x184
>  dev_pm_qos_update_request+0x38/0x58
>  msm_devfreq_idle_work+0x34/0x40
>  kthread_worker_fn+0x144/0x1c8
>  kthread+0x140/0x284
>  ret_from_fork+0x10/0x18
> Code: f9000bf3 910003fd aa0003f3 d503201f (b9400260)
> ---[ end trace f6309767a42d0831 ]---
>
> Which smells a lot like touching hw after power collapse.  This seems
> a bit like a race/timing issue elsewhere, as pm_runtime_get_if_in_use()
> in a6xx_gpu_busy() should have kept us from touching hw if it wasn't
> powered.

I dunno if we want to change the commit message since I think my patch
[1] addresses the above problem?

[1] 
https://lore.kernel.org/r/20220609094716.v2.1.Ie846c5352bc307ee4248d7cab998ab3016b85d06@changeid


> But, we've seen cases where the idle_work scheduled by
> msm_devfreq_idle() ends up racing with the resume path.  Which, again,
> shouldn't be a problem other than unnecessary freq changes.
>
> v2. Only move the runpm _put_autosuspend, and not the _mark_last_busy()
>
> Fixes: 9bc95570175a ("drm/msm: Devfreq tuning")
> Signed-off-by: Rob Clark 
> Link: https://lore.kernel.org/r/20210927152928.831245-1-robdcl...@gmail.com
> ---
>  drivers/gpu/drm/msm/msm_gpu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

In any case, your patch fixes the potential WARN_ON and seems like the
right thing to do, so:

Reviewed-by: Douglas Anderson 


Re: [PATCH v2] drm/msm: Switch ordering of runpm put vs devfreq_idle

2022-06-09 Thread Akhil P Oommen

On 6/8/2022 9:43 PM, Rob Clark wrote:

From: Rob Clark 

I've seen a few crashes like:

 CPU: 0 PID: 216 Comm: A618-worker Tainted: GW 5.4.196 #7
 Hardware name: Google Wormdingler rev1+ INX panel board (DT)
 pstate: 20c9 (nzCv daif +PAN +UAO)
 pc : msm_readl+0x14/0x34
 lr : a6xx_gpu_busy+0x40/0x80
 sp : ffc011b93ad0
 x29: ffc011b93ad0 x28: ffe77cba3000
 x27: 0001 x26: ffe77bb4c4ac
 x25: ffa2f227dfa0 x24: ffa2f22aab28
 x23:  x22: ffa2f22bf020
 x21: ffa2f22bf000 x20: ffc011b93b10
 x19: ffc011bd4110 x18: 000e
 x17: 0004 x16: 000c
 x15: 01be3a969450 x14: 0400
 x13: 000101d6 x12: 3415
 x11: 0001 x10: 
 x9 : 0001 x8 : ffc011bd4000
 x7 :  x6 : 0007
 x5 : ffc01d8b38f0 x4 : 
 x3 :  x2 : 0002
 x1 :  x0 : ffc011bd4110
 Call trace:
  msm_readl+0x14/0x34
  a6xx_gpu_busy+0x40/0x80
  msm_devfreq_get_dev_status+0x70/0x1d0
  devfreq_simple_ondemand_func+0x34/0x100
  update_devfreq+0x50/0xe8
  qos_notifier_call+0x2c/0x64
  qos_max_notifier_call+0x1c/0x2c
  notifier_call_chain+0x58/0x98
  __blocking_notifier_call_chain+0x74/0x84
  blocking_notifier_call_chain+0x38/0x48
  pm_qos_update_target+0xf8/0x19c
  freq_qos_apply+0x54/0x6c
  apply_constraint+0x60/0x104
  __dev_pm_qos_update_request+0xb4/0x184
  dev_pm_qos_update_request+0x38/0x58
  msm_devfreq_idle_work+0x34/0x40
  kthread_worker_fn+0x144/0x1c8
  kthread+0x140/0x284
  ret_from_fork+0x10/0x18
 Code: f9000bf3 910003fd aa0003f3 d503201f (b9400260)
 ---[ end trace f6309767a42d0831 ]---

Which smells a lot like touching hw after power collapse.  This seems
a bit like a race/timing issue elsewhere, as pm_runtime_get_if_in_use()
in a6xx_gpu_busy() should have kept us from touching hw if it wasn't
powered.

But, we've seen cases where the idle_work scheduled by
msm_devfreq_idle() ends up racing with the resume path.  Which, again,
shouldn't be a problem other than unnecessary freq changes.

v2. Only move the runpm _put_autosuspend, and not the _mark_last_busy()

Fixes: 9bc95570175a ("drm/msm: Devfreq tuning")
Signed-off-by: Rob Clark 
Link: https://lore.kernel.org/r/20210927152928.831245-1-robdcl...@gmail.com
---
  drivers/gpu/drm/msm/msm_gpu.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index eb8a6663f309..244511f85044 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -672,7 +672,6 @@ static void retire_submit(struct msm_gpu *gpu, struct 
msm_ringbuffer *ring,
msm_submit_retire(submit);
  
  	pm_runtime_mark_last_busy(>pdev->dev);

-   pm_runtime_put_autosuspend(>pdev->dev);
  
  	spin_lock_irqsave(>submit_lock, flags);

list_del(>node);
@@ -686,6 +685,8 @@ static void retire_submit(struct msm_gpu *gpu, struct 
msm_ringbuffer *ring,
msm_devfreq_idle(gpu);
mutex_unlock(>active_lock);
  
+	pm_runtime_put_autosuspend(>pdev->dev);

+
msm_gem_submit_put(submit);
  }
  


Reviewed-by: Akhil P Oommen 


-Akhil.




[PATCH v2] drm/msm: Switch ordering of runpm put vs devfreq_idle

2022-06-08 Thread Rob Clark
From: Rob Clark 

I've seen a few crashes like:

CPU: 0 PID: 216 Comm: A618-worker Tainted: GW 5.4.196 #7
Hardware name: Google Wormdingler rev1+ INX panel board (DT)
pstate: 20c9 (nzCv daif +PAN +UAO)
pc : msm_readl+0x14/0x34
lr : a6xx_gpu_busy+0x40/0x80
sp : ffc011b93ad0
x29: ffc011b93ad0 x28: ffe77cba3000
x27: 0001 x26: ffe77bb4c4ac
x25: ffa2f227dfa0 x24: ffa2f22aab28
x23:  x22: ffa2f22bf020
x21: ffa2f22bf000 x20: ffc011b93b10
x19: ffc011bd4110 x18: 000e
x17: 0004 x16: 000c
x15: 01be3a969450 x14: 0400
x13: 000101d6 x12: 3415
x11: 0001 x10: 
x9 : 0001 x8 : ffc011bd4000
x7 :  x6 : 0007
x5 : ffc01d8b38f0 x4 : 
x3 :  x2 : 0002
x1 :  x0 : ffc011bd4110
Call trace:
 msm_readl+0x14/0x34
 a6xx_gpu_busy+0x40/0x80
 msm_devfreq_get_dev_status+0x70/0x1d0
 devfreq_simple_ondemand_func+0x34/0x100
 update_devfreq+0x50/0xe8
 qos_notifier_call+0x2c/0x64
 qos_max_notifier_call+0x1c/0x2c
 notifier_call_chain+0x58/0x98
 __blocking_notifier_call_chain+0x74/0x84
 blocking_notifier_call_chain+0x38/0x48
 pm_qos_update_target+0xf8/0x19c
 freq_qos_apply+0x54/0x6c
 apply_constraint+0x60/0x104
 __dev_pm_qos_update_request+0xb4/0x184
 dev_pm_qos_update_request+0x38/0x58
 msm_devfreq_idle_work+0x34/0x40
 kthread_worker_fn+0x144/0x1c8
 kthread+0x140/0x284
 ret_from_fork+0x10/0x18
Code: f9000bf3 910003fd aa0003f3 d503201f (b9400260)
---[ end trace f6309767a42d0831 ]---

Which smells a lot like touching hw after power collapse.  This seems
a bit like a race/timing issue elsewhere, as pm_runtime_get_if_in_use()
in a6xx_gpu_busy() should have kept us from touching hw if it wasn't
powered.

But, we've seen cases where the idle_work scheduled by
msm_devfreq_idle() ends up racing with the resume path.  Which, again,
shouldn't be a problem other than unnecessary freq changes.

v2. Only move the runpm _put_autosuspend, and not the _mark_last_busy()

Fixes: 9bc95570175a ("drm/msm: Devfreq tuning")
Signed-off-by: Rob Clark 
Link: https://lore.kernel.org/r/20210927152928.831245-1-robdcl...@gmail.com
---
 drivers/gpu/drm/msm/msm_gpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index eb8a6663f309..244511f85044 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -672,7 +672,6 @@ static void retire_submit(struct msm_gpu *gpu, struct 
msm_ringbuffer *ring,
msm_submit_retire(submit);
 
pm_runtime_mark_last_busy(>pdev->dev);
-   pm_runtime_put_autosuspend(>pdev->dev);
 
spin_lock_irqsave(>submit_lock, flags);
list_del(>node);
@@ -686,6 +685,8 @@ static void retire_submit(struct msm_gpu *gpu, struct 
msm_ringbuffer *ring,
msm_devfreq_idle(gpu);
mutex_unlock(>active_lock);
 
+   pm_runtime_put_autosuspend(>pdev->dev);
+
msm_gem_submit_put(submit);
 }
 
-- 
2.36.1