Re: [PATCH v2] drm/msm: Switch ordering of runpm put vs devfreq_idle
Hi, On Wed, Jun 8, 2022 at 9:13 AM Rob Clark wrote: > > From: Rob Clark > > I've seen a few crashes like: > > CPU: 0 PID: 216 Comm: A618-worker Tainted: GW 5.4.196 #7 > Hardware name: Google Wormdingler rev1+ INX panel board (DT) > pstate: 20c9 (nzCv daif +PAN +UAO) > pc : msm_readl+0x14/0x34 > lr : a6xx_gpu_busy+0x40/0x80 > sp : ffc011b93ad0 > x29: ffc011b93ad0 x28: ffe77cba3000 > x27: 0001 x26: ffe77bb4c4ac > x25: ffa2f227dfa0 x24: ffa2f22aab28 > x23: x22: ffa2f22bf020 > x21: ffa2f22bf000 x20: ffc011b93b10 > x19: ffc011bd4110 x18: 000e > x17: 0004 x16: 000c > x15: 01be3a969450 x14: 0400 > x13: 000101d6 x12: 3415 > x11: 0001 x10: > x9 : 0001 x8 : ffc011bd4000 > x7 : x6 : 0007 > x5 : ffc01d8b38f0 x4 : > x3 : x2 : 0002 > x1 : x0 : ffc011bd4110 > Call trace: > msm_readl+0x14/0x34 > a6xx_gpu_busy+0x40/0x80 > msm_devfreq_get_dev_status+0x70/0x1d0 > devfreq_simple_ondemand_func+0x34/0x100 > update_devfreq+0x50/0xe8 > qos_notifier_call+0x2c/0x64 > qos_max_notifier_call+0x1c/0x2c > notifier_call_chain+0x58/0x98 > __blocking_notifier_call_chain+0x74/0x84 > blocking_notifier_call_chain+0x38/0x48 > pm_qos_update_target+0xf8/0x19c > freq_qos_apply+0x54/0x6c > apply_constraint+0x60/0x104 > __dev_pm_qos_update_request+0xb4/0x184 > dev_pm_qos_update_request+0x38/0x58 > msm_devfreq_idle_work+0x34/0x40 > kthread_worker_fn+0x144/0x1c8 > kthread+0x140/0x284 > ret_from_fork+0x10/0x18 > Code: f9000bf3 910003fd aa0003f3 d503201f (b9400260) > ---[ end trace f6309767a42d0831 ]--- > > Which smells a lot like touching hw after power collapse. This seems > a bit like a race/timing issue elsewhere, as pm_runtime_get_if_in_use() > in a6xx_gpu_busy() should have kept us from touching hw if it wasn't > powered. I dunno if we want to change the commit message since I think my patch [1] addresses the above problem? [1] https://lore.kernel.org/r/20220609094716.v2.1.Ie846c5352bc307ee4248d7cab998ab3016b85d06@changeid > But, we've seen cases where the idle_work scheduled by > msm_devfreq_idle() ends up racing with the resume path. Which, again, > shouldn't be a problem other than unnecessary freq changes. > > v2. Only move the runpm _put_autosuspend, and not the _mark_last_busy() > > Fixes: 9bc95570175a ("drm/msm: Devfreq tuning") > Signed-off-by: Rob Clark > Link: https://lore.kernel.org/r/20210927152928.831245-1-robdcl...@gmail.com > --- > drivers/gpu/drm/msm/msm_gpu.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) In any case, your patch fixes the potential WARN_ON and seems like the right thing to do, so: Reviewed-by: Douglas Anderson
Re: [PATCH v2] drm/msm: Switch ordering of runpm put vs devfreq_idle
On 6/8/2022 9:43 PM, Rob Clark wrote: From: Rob Clark I've seen a few crashes like: CPU: 0 PID: 216 Comm: A618-worker Tainted: GW 5.4.196 #7 Hardware name: Google Wormdingler rev1+ INX panel board (DT) pstate: 20c9 (nzCv daif +PAN +UAO) pc : msm_readl+0x14/0x34 lr : a6xx_gpu_busy+0x40/0x80 sp : ffc011b93ad0 x29: ffc011b93ad0 x28: ffe77cba3000 x27: 0001 x26: ffe77bb4c4ac x25: ffa2f227dfa0 x24: ffa2f22aab28 x23: x22: ffa2f22bf020 x21: ffa2f22bf000 x20: ffc011b93b10 x19: ffc011bd4110 x18: 000e x17: 0004 x16: 000c x15: 01be3a969450 x14: 0400 x13: 000101d6 x12: 3415 x11: 0001 x10: x9 : 0001 x8 : ffc011bd4000 x7 : x6 : 0007 x5 : ffc01d8b38f0 x4 : x3 : x2 : 0002 x1 : x0 : ffc011bd4110 Call trace: msm_readl+0x14/0x34 a6xx_gpu_busy+0x40/0x80 msm_devfreq_get_dev_status+0x70/0x1d0 devfreq_simple_ondemand_func+0x34/0x100 update_devfreq+0x50/0xe8 qos_notifier_call+0x2c/0x64 qos_max_notifier_call+0x1c/0x2c notifier_call_chain+0x58/0x98 __blocking_notifier_call_chain+0x74/0x84 blocking_notifier_call_chain+0x38/0x48 pm_qos_update_target+0xf8/0x19c freq_qos_apply+0x54/0x6c apply_constraint+0x60/0x104 __dev_pm_qos_update_request+0xb4/0x184 dev_pm_qos_update_request+0x38/0x58 msm_devfreq_idle_work+0x34/0x40 kthread_worker_fn+0x144/0x1c8 kthread+0x140/0x284 ret_from_fork+0x10/0x18 Code: f9000bf3 910003fd aa0003f3 d503201f (b9400260) ---[ end trace f6309767a42d0831 ]--- Which smells a lot like touching hw after power collapse. This seems a bit like a race/timing issue elsewhere, as pm_runtime_get_if_in_use() in a6xx_gpu_busy() should have kept us from touching hw if it wasn't powered. But, we've seen cases where the idle_work scheduled by msm_devfreq_idle() ends up racing with the resume path. Which, again, shouldn't be a problem other than unnecessary freq changes. v2. Only move the runpm _put_autosuspend, and not the _mark_last_busy() Fixes: 9bc95570175a ("drm/msm: Devfreq tuning") Signed-off-by: Rob Clark Link: https://lore.kernel.org/r/20210927152928.831245-1-robdcl...@gmail.com --- drivers/gpu/drm/msm/msm_gpu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index eb8a6663f309..244511f85044 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -672,7 +672,6 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_ringbuffer *ring, msm_submit_retire(submit); pm_runtime_mark_last_busy(>pdev->dev); - pm_runtime_put_autosuspend(>pdev->dev); spin_lock_irqsave(>submit_lock, flags); list_del(>node); @@ -686,6 +685,8 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_ringbuffer *ring, msm_devfreq_idle(gpu); mutex_unlock(>active_lock); + pm_runtime_put_autosuspend(>pdev->dev); + msm_gem_submit_put(submit); } Reviewed-by: Akhil P Oommen -Akhil.
[PATCH v2] drm/msm: Switch ordering of runpm put vs devfreq_idle
From: Rob Clark I've seen a few crashes like: CPU: 0 PID: 216 Comm: A618-worker Tainted: GW 5.4.196 #7 Hardware name: Google Wormdingler rev1+ INX panel board (DT) pstate: 20c9 (nzCv daif +PAN +UAO) pc : msm_readl+0x14/0x34 lr : a6xx_gpu_busy+0x40/0x80 sp : ffc011b93ad0 x29: ffc011b93ad0 x28: ffe77cba3000 x27: 0001 x26: ffe77bb4c4ac x25: ffa2f227dfa0 x24: ffa2f22aab28 x23: x22: ffa2f22bf020 x21: ffa2f22bf000 x20: ffc011b93b10 x19: ffc011bd4110 x18: 000e x17: 0004 x16: 000c x15: 01be3a969450 x14: 0400 x13: 000101d6 x12: 3415 x11: 0001 x10: x9 : 0001 x8 : ffc011bd4000 x7 : x6 : 0007 x5 : ffc01d8b38f0 x4 : x3 : x2 : 0002 x1 : x0 : ffc011bd4110 Call trace: msm_readl+0x14/0x34 a6xx_gpu_busy+0x40/0x80 msm_devfreq_get_dev_status+0x70/0x1d0 devfreq_simple_ondemand_func+0x34/0x100 update_devfreq+0x50/0xe8 qos_notifier_call+0x2c/0x64 qos_max_notifier_call+0x1c/0x2c notifier_call_chain+0x58/0x98 __blocking_notifier_call_chain+0x74/0x84 blocking_notifier_call_chain+0x38/0x48 pm_qos_update_target+0xf8/0x19c freq_qos_apply+0x54/0x6c apply_constraint+0x60/0x104 __dev_pm_qos_update_request+0xb4/0x184 dev_pm_qos_update_request+0x38/0x58 msm_devfreq_idle_work+0x34/0x40 kthread_worker_fn+0x144/0x1c8 kthread+0x140/0x284 ret_from_fork+0x10/0x18 Code: f9000bf3 910003fd aa0003f3 d503201f (b9400260) ---[ end trace f6309767a42d0831 ]--- Which smells a lot like touching hw after power collapse. This seems a bit like a race/timing issue elsewhere, as pm_runtime_get_if_in_use() in a6xx_gpu_busy() should have kept us from touching hw if it wasn't powered. But, we've seen cases where the idle_work scheduled by msm_devfreq_idle() ends up racing with the resume path. Which, again, shouldn't be a problem other than unnecessary freq changes. v2. Only move the runpm _put_autosuspend, and not the _mark_last_busy() Fixes: 9bc95570175a ("drm/msm: Devfreq tuning") Signed-off-by: Rob Clark Link: https://lore.kernel.org/r/20210927152928.831245-1-robdcl...@gmail.com --- drivers/gpu/drm/msm/msm_gpu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index eb8a6663f309..244511f85044 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -672,7 +672,6 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_ringbuffer *ring, msm_submit_retire(submit); pm_runtime_mark_last_busy(>pdev->dev); - pm_runtime_put_autosuspend(>pdev->dev); spin_lock_irqsave(>submit_lock, flags); list_del(>node); @@ -686,6 +685,8 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_ringbuffer *ring, msm_devfreq_idle(gpu); mutex_unlock(>active_lock); + pm_runtime_put_autosuspend(>pdev->dev); + msm_gem_submit_put(submit); } -- 2.36.1