Re: [Freedreno] [PATCH 1/4] drm/msm/adreno: Fix null ptr access in adreno_gpu_cleanup()

2022-12-06 Thread Akhil P Oommen
On 12/5/2022 2:10 PM, Dan Carpenter wrote:
> On Sun, Dec 04, 2022 at 04:11:41AM +0530, Akhil P Oommen wrote:
>> Fix the below kernel panic due to null pointer access:
>> [   18.504431] Unable to handle kernel NULL pointer dereference at virtual 
>> address 0048
>> [   18.513464] Mem abort info:
>> [   18.516346]   ESR = 0x9605
>> [   18.520204]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [   18.525706]   SET = 0, FnV = 0
>> [   18.528878]   EA = 0, S1PTW = 0
>> [   18.532117]   FSC = 0x05: level 1 translation fault
>> [   18.537138] Data abort info:
>> [   18.540110]   ISV = 0, ISS = 0x0005
>> [   18.544060]   CM = 0, WnR = 0
>> [   18.547109] user pgtable: 4k pages, 39-bit VAs, pgdp=000112826000
>> [   18.553738] [0048] pgd=, 
>> p4d=, pud=
>> [   18.562690] Internal error: Oops: 9605 [#1] PREEMPT SMP
>> **Snip**
>> [   18.696758] Call trace:
>> [   18.699278]  adreno_gpu_cleanup+0x30/0x88
>> [   18.703396]  a6xx_destroy+0xc0/0x130
>> [   18.707066]  a6xx_gpu_init+0x308/0x424
> Fixes: 17e822f7591f ("drm/msm: fix unbalanced pm_runtime_enable in 
> adreno_gpu_{init, cleanup}")
>
> Let's add Jonathan to the CC list so he can Ack your patch.
Thanks, will post a v2.

-Akhil.
>
> Although the real issue is that a6xx_gpu_init has bad error handling.
>
> The a6xx_destroy() function supposed to free *everything* so then the
> question becomes how do we avoid freeing something which was not
> allocated?  With normal kernel style we just free things one by one
> in the reverse order from how they were allocated.  See my blog for more
> details:
> https://staticthinking.wordpress.com/2022/04/28/free-the-last-thing-style/
Nice post. Thanks for sharing.
>
> However this code is written in One Function Frees Everything Style
> which is difficult to review and prone to bugs.  The common mistakes are
> the kind of NULL dereference that you've seen, double frees, and missing
> frees.
>
> The only way to read this code is to open a new text editor window and
> line up the allocations with the frees.
>
>   1725  static void a6xx_destroy(struct msm_gpu *gpu)
>   1726  {
>   1727  struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>   1728  struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>   1729  
>   1730  if (a6xx_gpu->sqe_bo) {
>   1731  msm_gem_unpin_iova(a6xx_gpu->sqe_bo, gpu->aspace);
>   1732  drm_gem_object_put(a6xx_gpu->sqe_bo);
>
> These unpin/put must be done together and should be in their own
> function.  The ->sqe_bo pointer is allocated in a6xx_ucode_init().  It's
> assigned to an error pointer, but then set to NULL on error or after a
> free.  So this is okay.
Agree. This warrants a helper function. I count 11 instances where it will be 
useful.

>
>   1733  }
>   1734  
>   1735  if (a6xx_gpu->shadow_bo) {
>
> ->shadow_bo is allocated in hw_init().  Should there be a call to
> msm_gem_put_vaddr(a6xx_gpu->shadow)?  It's unclear.  [QUESTION #1]
Yes. This should be freed with msm_gem_kernel_put() which takes care of freeing 
vaddr.
>
>   1736  msm_gem_unpin_iova(a6xx_gpu->shadow_bo, gpu->aspace);
>   1737  drm_gem_object_put(a6xx_gpu->shadow_bo);
>   1738  }
>   1739  
>   1740  a6xx_llc_slices_destroy(a6xx_gpu);
>
> This has IS_ERR_OR_NULL() checks so it's okay.
>
>   1741  
>   1742  a6xx_gmu_remove(a6xx_gpu);
>
> This uses a gmu->initialized flag which allows it to safely clean up
> everything.  Fine.
>
>   1743  
>   1744  adreno_gpu_cleanup(adreno_gpu);
>
> This function has the bug that you identified.  Let's dig into it.
> (With normal kernel error handling you can read the error handling by
> looking at the label name but with this style we need to jump around and
> compare code from different files).
>
>   1745  
>   1746  kfree(a6xx_gpu);
>   1747  }
>
> drivers/gpu/drm/msm/adreno/adreno_gpu.c
>   1079  void adreno_gpu_cleanup(struct adreno_gpu *adreno_gpu)
>   1080  {
>   1081  struct msm_gpu *gpu = _gpu->base;
>   1082  struct msm_drm_private *priv = gpu->dev->dev_private;
>   1083  unsigned int i;
>   1084  
>   1085  for (i = 0; i < ARRAY_SIZE(adreno_gpu->info->fw); i++)
>   1086  release_firmware(adreno_gpu->fw[i]);
>
> This is okay.  ->fw[i] is either valid or NULL and releasing a NULL is
> fine.
>
>   1087  
>   1088  if (pm_runtime_enabled(>gpu_pdev->dev))
>
> This is the bug you found.
>
>   1089  pm_runtime_disable(>gpu_pdev->dev);
>   1090  
>   1091  msm_gpu_cleanup(_gpu->base);
>
> Let's dig into msm_gpu_cleanup().
>
>   1092  }
>
> drivers/gpu/drm/msm/msm_gpu.c
>   1006  void msm_gpu_cleanup(struct msm_gpu *gpu)
>   1007  {
>   1008  int i;
>   1009  
>   1010  DBG("%s", gpu->name);
>   1011  
>   1012  for (i = 0; i < 

Re: [Freedreno] [PATCH 1/4] drm/msm/adreno: Fix null ptr access in adreno_gpu_cleanup()

2022-12-05 Thread Dan Carpenter
On Sun, Dec 04, 2022 at 04:11:41AM +0530, Akhil P Oommen wrote:
> Fix the below kernel panic due to null pointer access:
> [   18.504431] Unable to handle kernel NULL pointer dereference at virtual 
> address 0048
> [   18.513464] Mem abort info:
> [   18.516346]   ESR = 0x9605
> [   18.520204]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   18.525706]   SET = 0, FnV = 0
> [   18.528878]   EA = 0, S1PTW = 0
> [   18.532117]   FSC = 0x05: level 1 translation fault
> [   18.537138] Data abort info:
> [   18.540110]   ISV = 0, ISS = 0x0005
> [   18.544060]   CM = 0, WnR = 0
> [   18.547109] user pgtable: 4k pages, 39-bit VAs, pgdp=000112826000
> [   18.553738] [0048] pgd=, p4d=, 
> pud=
> [   18.562690] Internal error: Oops: 9605 [#1] PREEMPT SMP
> **Snip**
> [   18.696758] Call trace:
> [   18.699278]  adreno_gpu_cleanup+0x30/0x88
> [   18.703396]  a6xx_destroy+0xc0/0x130
> [   18.707066]  a6xx_gpu_init+0x308/0x424

Fixes: 17e822f7591f ("drm/msm: fix unbalanced pm_runtime_enable in 
adreno_gpu_{init, cleanup}")

Let's add Jonathan to the CC list so he can Ack your patch.

Although the real issue is that a6xx_gpu_init has bad error handling.

The a6xx_destroy() function supposed to free *everything* so then the
question becomes how do we avoid freeing something which was not
allocated?  With normal kernel style we just free things one by one
in the reverse order from how they were allocated.  See my blog for more
details:
https://staticthinking.wordpress.com/2022/04/28/free-the-last-thing-style/

However this code is written in One Function Frees Everything Style
which is difficult to review and prone to bugs.  The common mistakes are
the kind of NULL dereference that you've seen, double frees, and missing
frees.

The only way to read this code is to open a new text editor window and
line up the allocations with the frees.

  1725  static void a6xx_destroy(struct msm_gpu *gpu)
  1726  {
  1727  struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
  1728  struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
  1729  
  1730  if (a6xx_gpu->sqe_bo) {
  1731  msm_gem_unpin_iova(a6xx_gpu->sqe_bo, gpu->aspace);
  1732  drm_gem_object_put(a6xx_gpu->sqe_bo);

These unpin/put must be done together and should be in their own
function.  The ->sqe_bo pointer is allocated in a6xx_ucode_init().  It's
assigned to an error pointer, but then set to NULL on error or after a
free.  So this is okay.

  1733  }
  1734  
  1735  if (a6xx_gpu->shadow_bo) {

->shadow_bo is allocated in hw_init().  Should there be a call to
msm_gem_put_vaddr(a6xx_gpu->shadow)?  It's unclear.  [QUESTION #1]

  1736  msm_gem_unpin_iova(a6xx_gpu->shadow_bo, gpu->aspace);
  1737  drm_gem_object_put(a6xx_gpu->shadow_bo);
  1738  }
  1739  
  1740  a6xx_llc_slices_destroy(a6xx_gpu);

This has IS_ERR_OR_NULL() checks so it's okay.

  1741  
  1742  a6xx_gmu_remove(a6xx_gpu);

This uses a gmu->initialized flag which allows it to safely clean up
everything.  Fine.

  1743  
  1744  adreno_gpu_cleanup(adreno_gpu);

This function has the bug that you identified.  Let's dig into it.
(With normal kernel error handling you can read the error handling by
looking at the label name but with this style we need to jump around and
compare code from different files).

  1745  
  1746  kfree(a6xx_gpu);
  1747  }

drivers/gpu/drm/msm/adreno/adreno_gpu.c
  1079  void adreno_gpu_cleanup(struct adreno_gpu *adreno_gpu)
  1080  {
  1081  struct msm_gpu *gpu = _gpu->base;
  1082  struct msm_drm_private *priv = gpu->dev->dev_private;
  1083  unsigned int i;
  1084  
  1085  for (i = 0; i < ARRAY_SIZE(adreno_gpu->info->fw); i++)
  1086  release_firmware(adreno_gpu->fw[i]);

This is okay.  ->fw[i] is either valid or NULL and releasing a NULL is
fine.

  1087  
  1088  if (pm_runtime_enabled(>gpu_pdev->dev))

This is the bug you found.

  1089  pm_runtime_disable(>gpu_pdev->dev);
  1090  
  1091  msm_gpu_cleanup(_gpu->base);

Let's dig into msm_gpu_cleanup().

  1092  }

drivers/gpu/drm/msm/msm_gpu.c
  1006  void msm_gpu_cleanup(struct msm_gpu *gpu)
  1007  {
  1008  int i;
  1009  
  1010  DBG("%s", gpu->name);
  1011  
  1012  for (i = 0; i < ARRAY_SIZE(gpu->rb); i++) {
  1013  msm_ringbuffer_destroy(gpu->rb[i]);

Destroying an error pointer is fine so this is okay.

  1014  gpu->rb[i] = NULL;
  1015  }
  1016  
  1017  msm_gem_kernel_put(gpu->memptrs_bo, gpu->aspace);
^^^
[QUESTION #2] Passing an error pointer here will trigger a stack trace
so this is bug.  The drivers->aspace pointer is allocted in

[Freedreno] [PATCH 1/4] drm/msm/adreno: Fix null ptr access in adreno_gpu_cleanup()

2022-12-03 Thread Akhil P Oommen
Fix the below kernel panic due to null pointer access:
[   18.504431] Unable to handle kernel NULL pointer dereference at virtual 
address 0048
[   18.513464] Mem abort info:
[   18.516346]   ESR = 0x9605
[   18.520204]   EC = 0x25: DABT (current EL), IL = 32 bits
[   18.525706]   SET = 0, FnV = 0
[   18.528878]   EA = 0, S1PTW = 0
[   18.532117]   FSC = 0x05: level 1 translation fault
[   18.537138] Data abort info:
[   18.540110]   ISV = 0, ISS = 0x0005
[   18.544060]   CM = 0, WnR = 0
[   18.547109] user pgtable: 4k pages, 39-bit VAs, pgdp=000112826000
[   18.553738] [0048] pgd=, p4d=, 
pud=
[   18.562690] Internal error: Oops: 9605 [#1] PREEMPT SMP
**Snip**
[   18.696758] Call trace:
[   18.699278]  adreno_gpu_cleanup+0x30/0x88
[   18.703396]  a6xx_destroy+0xc0/0x130
[   18.707066]  a6xx_gpu_init+0x308/0x424
[   18.710921]  adreno_bind+0x178/0x288
[   18.714590]  component_bind_all+0xe0/0x214
[   18.718797]  msm_drm_bind+0x1d4/0x614
[   18.722566]  try_to_bring_up_aggregate_device+0x16c/0x1b8
[   18.728105]  __component_add+0xa0/0x158
[   18.732048]  component_add+0x20/0x2c
[   18.735719]  adreno_probe+0x40/0xc0
[   18.739300]  platform_probe+0xb4/0xd4
[   18.743068]  really_probe+0xfc/0x284
[   18.746738]  __driver_probe_device+0xc0/0xec
[   18.751129]  driver_probe_device+0x48/0x110
[   18.755421]  __device_attach_driver+0xa8/0xd0
[   18.759900]  bus_for_each_drv+0x90/0xdc
[   18.763843]  __device_attach+0xfc/0x174
[   18.767786]  device_initial_probe+0x20/0x2c
[   18.772090]  bus_probe_device+0x40/0xa0
[   18.776032]  deferred_probe_work_func+0x94/0xd0
[   18.780686]  process_one_work+0x190/0x3d0
[   18.784805]  worker_thread+0x280/0x3d4
[   18.788659]  kthread+0x104/0x1c0
[   18.791981]  ret_from_fork+0x10/0x20
[   18.795654] Code: f9400408 aa0003f3 aa1f03f4 91142015 (f9402516)
[   18.801913] ---[ end trace  ]---
[   18.809039] Kernel panic - not syncing: Oops: Fatal exception

Signed-off-by: Akhil P Oommen 
---

 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 382fb7f9e497..118d07e5c66c 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -1073,13 +1073,13 @@ int adreno_gpu_init(struct drm_device *drm, struct 
platform_device *pdev,
 void adreno_gpu_cleanup(struct adreno_gpu *adreno_gpu)
 {
struct msm_gpu *gpu = _gpu->base;
-   struct msm_drm_private *priv = gpu->dev->dev_private;
+   struct msm_drm_private *priv = gpu->dev ? gpu->dev->dev_private : NULL;
unsigned int i;
 
for (i = 0; i < ARRAY_SIZE(adreno_gpu->info->fw); i++)
release_firmware(adreno_gpu->fw[i]);
 
-   if (pm_runtime_enabled(>gpu_pdev->dev))
+   if (priv && pm_runtime_enabled(>gpu_pdev->dev))
pm_runtime_disable(>gpu_pdev->dev);
 
msm_gpu_cleanup(_gpu->base);
-- 
2.7.4