Re: [Intel-gfx] [RFC PATCH 3/4] drm/i915: Move GGTT cleanup from driver_release to _remove

2020-05-28 Thread Janusz Krzysztofik
On Wed, 2020-05-27 at 21:12 +0200, Michał Winiarski wrote:
> Quoting Janusz Krzysztofik (2020-05-18 20:17:19)
> > GGTT including its scratch page is not cleaned up until driver release.
> > Since DMA mappings still exist before scratch page cleanup, unmapping
> > them on last device close after the driver has been already removed may
> > be judged by intel-iommu code as a bug and result in kernel panic.
> > 
> > Since we abort requests and block user access to hardware on device
> > removal, there seems not much sense in still keeping GGTT.  Clean it up
> > on driver remove.  However, skip GGTT address space cleanup as its
> > mutext may still be needed if there are objects to be freed.  That
> > cleanup is always called on address space release after all.
> > 
> > [   81.290284] [ cut here ]
> > [   81.291602] kernel BUG at drivers/iommu/intel-iommu.c:3591!
> > [   81.293558] invalid opcode:  [#1] PREEMPT SMP
> > [   81.294695] CPU: 0 PID: 207 Comm: core_hotunplug Tainted: G U
> > 5.4.17 #2
> > [   81.296579] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> > [   81.297959] RIP: 0010:intel_unmap+0x200/0x230
> > [   81.299015] Code: 00 e8 e4 45 c5 ff 85 c0 74 09 80 3d 2b 84 c0 00 00 74 
> > 19 65 ff 0d 78 9a b2 7e 0f 85 fa fe ff ff e8 95 57 b1 ff e9 f0 fe ff ff 
> > <0f> 0b e8 19 4c c5 ff 85 c0 75 de 48 c7 c2 48 d2 e1 81 be 57 00 00
> > [   81.303425] RSP: 0018:c913fda0 EFLAGS: 00010246
> > [   81.304683] RAX:  RBX: 8882228dd0b0 RCX: 
> > 
> > [   81.306384] RDX: 1000 RSI: af801000 RDI: 
> > 8882228dd0b0
> > [   81.308086] RBP:  R08:  R09: 
> > 
> > [   81.309788] R10:  R11:  R12: 
> > af801000
> > [   81.311489] R13: 888223a0 R14: 1000 R15: 
> > 888223a0a2e8
> > [   81.313191] FS:  7f5408e3c940() GS:88822860() 
> > knlGS:
> > [   81.315116] CS:  0010 DS:  ES:  CR0: 80050033
> > [   81.316495] CR2: 01fc0048 CR3: 00022464a000 CR4: 
> > 06b0
> > [   81.318196] Call Trace:
> > [   81.318967]  cleanup_scratch_page+0x44/0x80 [i915]
> > [   81.320281]  i915_ggtt_driver_release+0x15b/0x220 [i915]
> > [   81.321717]  i915_driver_release+0x33/0x90 [i915]
> > [   81.322856]  drm_release+0xbc/0xd0
> > [   81.323691]  __fput+0xcd/0x260
> > [   81.324447]  task_work_run+0x90/0xc0
> > [   81.325323]  do_syscall_64+0x3da/0x560
> > [   81.326240]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [   81.327457] RIP: 0033:0x7f54096ecba3
> > [   81.328332] Code: 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 
> > 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 03 00 00 00 0f 05 
> > <48> 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8
> > [   81.332741] RSP: 002b:7ffcc5165698 EFLAGS: 0246 ORIG_RAX: 
> > 0003
> > [   81.334546] RAX:  RBX:  RCX: 
> > 7f54096ecba3
> > [   81.336247] RDX: 005cc5d0 RSI: 0005 RDI: 
> > 0004
> > [   81.337949] RBP: 0003 R08: 005b8014 R09: 
> > 0004
> > [   81.339650] R10: 005cc650 R11: 0246 R12: 
> > 004022f0
> > [   81.341352] R13: 7ffcc5165850 R14:  R15: 
> > 
> > [   81.343059] Modules linked in: i915 mfd_core intel_gtt prime_numbers
> > [   81.345015] ---[ end trace 010aae55e56f8998 ]---
> > 
> > Signed-off-by: Janusz Krzysztofik 
> > 
> > drm/i915: Defer GGTT vm address space fini to vm release
> 
> Hum?

That's a patch squashing left-over I apparently missed, sorry.

> 
> > Signed-off-by: Janusz Krzysztofik 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_ggtt.c | 13 +
> >  drivers/gpu/drm/i915/gt/intel_gtt.h  |  1 +
> >  drivers/gpu/drm/i915/i915_drv.c  |  2 ++
> >  3 files changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
> > b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > index 66165b10256e..ff2b4f74149a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > @@ -701,7 +701,6 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
> > ggtt->vm.cleanup(>vm);
> >  
> > mutex_unlock(>vm.mutex);
> > -   i915_address_space_fini(>vm);
> 
> Ok, so this was defered to release. Where are we going to drop the final ref?
> And also - I can see that we have a:
> 
>   GEM_BUG_ON(i915_is_ggtt(vm));
> 
> in i915_vm_release().
> Which means that we probably don't drop the final ref and don't ever call
> i915_address_space_fini for ggtt.

Uh, that renders my solution invalid.  Let me take another approach.

Thanks,
Janusz

> 
> -Michał
> 
> >  
> > arch_phys_wc_del(ggtt->mtrr);
> >  
> > @@ -709,6 +708,15 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
> >   

Re: [Intel-gfx] [RFC PATCH 3/4] drm/i915: Move GGTT cleanup from driver_release to _remove

2020-05-27 Thread Michał Winiarski
Quoting Janusz Krzysztofik (2020-05-18 20:17:19)
> GGTT including its scratch page is not cleaned up until driver release.
> Since DMA mappings still exist before scratch page cleanup, unmapping
> them on last device close after the driver has been already removed may
> be judged by intel-iommu code as a bug and result in kernel panic.
> 
> Since we abort requests and block user access to hardware on device
> removal, there seems not much sense in still keeping GGTT.  Clean it up
> on driver remove.  However, skip GGTT address space cleanup as its
> mutext may still be needed if there are objects to be freed.  That
> cleanup is always called on address space release after all.
> 
> [   81.290284] [ cut here ]
> [   81.291602] kernel BUG at drivers/iommu/intel-iommu.c:3591!
> [   81.293558] invalid opcode:  [#1] PREEMPT SMP
> [   81.294695] CPU: 0 PID: 207 Comm: core_hotunplug Tainted: G U  
>   5.4.17 #2
> [   81.296579] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> [   81.297959] RIP: 0010:intel_unmap+0x200/0x230
> [   81.299015] Code: 00 e8 e4 45 c5 ff 85 c0 74 09 80 3d 2b 84 c0 00 00 74 19 
> 65 ff 0d 78 9a b2 7e 0f 85 fa fe ff ff e8 95 57 b1 ff e9 f0 fe ff ff <0f> 0b 
> e8 19 4c c5 ff 85 c0 75 de 48 c7 c2 48 d2 e1 81 be 57 00 00
> [   81.303425] RSP: 0018:c913fda0 EFLAGS: 00010246
> [   81.304683] RAX:  RBX: 8882228dd0b0 RCX: 
> 
> [   81.306384] RDX: 1000 RSI: af801000 RDI: 
> 8882228dd0b0
> [   81.308086] RBP:  R08:  R09: 
> 
> [   81.309788] R10:  R11:  R12: 
> af801000
> [   81.311489] R13: 888223a0 R14: 1000 R15: 
> 888223a0a2e8
> [   81.313191] FS:  7f5408e3c940() GS:88822860() 
> knlGS:
> [   81.315116] CS:  0010 DS:  ES:  CR0: 80050033
> [   81.316495] CR2: 01fc0048 CR3: 00022464a000 CR4: 
> 06b0
> [   81.318196] Call Trace:
> [   81.318967]  cleanup_scratch_page+0x44/0x80 [i915]
> [   81.320281]  i915_ggtt_driver_release+0x15b/0x220 [i915]
> [   81.321717]  i915_driver_release+0x33/0x90 [i915]
> [   81.322856]  drm_release+0xbc/0xd0
> [   81.323691]  __fput+0xcd/0x260
> [   81.324447]  task_work_run+0x90/0xc0
> [   81.325323]  do_syscall_64+0x3da/0x560
> [   81.326240]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [   81.327457] RIP: 0033:0x7f54096ecba3
> [   81.328332] Code: 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 
> 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 03 00 00 00 0f 05 <48> 3d 
> 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8
> [   81.332741] RSP: 002b:7ffcc5165698 EFLAGS: 0246 ORIG_RAX: 
> 0003
> [   81.334546] RAX:  RBX:  RCX: 
> 7f54096ecba3
> [   81.336247] RDX: 005cc5d0 RSI: 0005 RDI: 
> 0004
> [   81.337949] RBP: 0003 R08: 005b8014 R09: 
> 0004
> [   81.339650] R10: 005cc650 R11: 0246 R12: 
> 004022f0
> [   81.341352] R13: 7ffcc5165850 R14:  R15: 
> 
> [   81.343059] Modules linked in: i915 mfd_core intel_gtt prime_numbers
> [   81.345015] ---[ end trace 010aae55e56f8998 ]---
> 
> Signed-off-by: Janusz Krzysztofik 
> 
> drm/i915: Defer GGTT vm address space fini to vm release

Hum?

> 
> Signed-off-by: Janusz Krzysztofik 
> ---
>  drivers/gpu/drm/i915/gt/intel_ggtt.c | 13 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h  |  1 +
>  drivers/gpu/drm/i915/i915_drv.c  |  2 ++
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 66165b10256e..ff2b4f74149a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -701,7 +701,6 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
> ggtt->vm.cleanup(>vm);
>  
> mutex_unlock(>vm.mutex);
> -   i915_address_space_fini(>vm);

Ok, so this was defered to release. Where are we going to drop the final ref?
And also - I can see that we have a:

GEM_BUG_ON(i915_is_ggtt(vm));

in i915_vm_release().
Which means that we probably don't drop the final ref and don't ever call
i915_address_space_fini for ggtt.

-Michał

>  
> arch_phys_wc_del(ggtt->mtrr);
>  
> @@ -709,6 +708,15 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
> io_mapping_fini(>iomap);
>  }
>  
> +void i915_ggtt_driver_remove(struct drm_i915_private *i915)
> +{
> +   struct i915_ggtt *ggtt = >ggtt;
> +
> +   fini_aliasing_ppgtt(ggtt);
> +
> +   ggtt_cleanup_hw(ggtt);
> +}
> +
>  /**
>   * i915_ggtt_driver_release - Clean up GGTT hardware initialization
>   * @i915: i915 device
> @@ -718,10 +726,7 @@ void i915_ggtt_driver_release(struct 

[Intel-gfx] [RFC PATCH 3/4] drm/i915: Move GGTT cleanup from driver_release to _remove

2020-05-18 Thread Janusz Krzysztofik
GGTT including its scratch page is not cleaned up until driver release.
Since DMA mappings still exist before scratch page cleanup, unmapping
them on last device close after the driver has been already removed may
be judged by intel-iommu code as a bug and result in kernel panic.

Since we abort requests and block user access to hardware on device
removal, there seems not much sense in still keeping GGTT.  Clean it up
on driver remove.  However, skip GGTT address space cleanup as its
mutext may still be needed if there are objects to be freed.  That
cleanup is always called on address space release after all.

[   81.290284] [ cut here ]
[   81.291602] kernel BUG at drivers/iommu/intel-iommu.c:3591!
[   81.293558] invalid opcode:  [#1] PREEMPT SMP
[   81.294695] CPU: 0 PID: 207 Comm: core_hotunplug Tainted: G U
5.4.17 #2
[   81.296579] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   81.297959] RIP: 0010:intel_unmap+0x200/0x230
[   81.299015] Code: 00 e8 e4 45 c5 ff 85 c0 74 09 80 3d 2b 84 c0 00 00 74 19 
65 ff 0d 78 9a b2 7e 0f 85 fa fe ff ff e8 95 57 b1 ff e9 f0 fe ff ff <0f> 0b e8 
19 4c c5 ff 85 c0 75 de 48 c7 c2 48 d2 e1 81 be 57 00 00
[   81.303425] RSP: 0018:c913fda0 EFLAGS: 00010246
[   81.304683] RAX:  RBX: 8882228dd0b0 RCX: 
[   81.306384] RDX: 1000 RSI: af801000 RDI: 8882228dd0b0
[   81.308086] RBP:  R08:  R09: 
[   81.309788] R10:  R11:  R12: af801000
[   81.311489] R13: 888223a0 R14: 1000 R15: 888223a0a2e8
[   81.313191] FS:  7f5408e3c940() GS:88822860() 
knlGS:
[   81.315116] CS:  0010 DS:  ES:  CR0: 80050033
[   81.316495] CR2: 01fc0048 CR3: 00022464a000 CR4: 06b0
[   81.318196] Call Trace:
[   81.318967]  cleanup_scratch_page+0x44/0x80 [i915]
[   81.320281]  i915_ggtt_driver_release+0x15b/0x220 [i915]
[   81.321717]  i915_driver_release+0x33/0x90 [i915]
[   81.322856]  drm_release+0xbc/0xd0
[   81.323691]  __fput+0xcd/0x260
[   81.324447]  task_work_run+0x90/0xc0
[   81.325323]  do_syscall_64+0x3da/0x560
[   81.326240]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   81.327457] RIP: 0033:0x7f54096ecba3
[   81.328332] Code: 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 
00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 03 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8
[   81.332741] RSP: 002b:7ffcc5165698 EFLAGS: 0246 ORIG_RAX: 
0003
[   81.334546] RAX:  RBX:  RCX: 7f54096ecba3
[   81.336247] RDX: 005cc5d0 RSI: 0005 RDI: 0004
[   81.337949] RBP: 0003 R08: 005b8014 R09: 0004
[   81.339650] R10: 005cc650 R11: 0246 R12: 004022f0
[   81.341352] R13: 7ffcc5165850 R14:  R15: 
[   81.343059] Modules linked in: i915 mfd_core intel_gtt prime_numbers
[   81.345015] ---[ end trace 010aae55e56f8998 ]---

Signed-off-by: Janusz Krzysztofik 

drm/i915: Defer GGTT vm address space fini to vm release

Signed-off-by: Janusz Krzysztofik 
---
 drivers/gpu/drm/i915/gt/intel_ggtt.c | 13 +
 drivers/gpu/drm/i915/gt/intel_gtt.h  |  1 +
 drivers/gpu/drm/i915/i915_drv.c  |  2 ++
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 66165b10256e..ff2b4f74149a 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -701,7 +701,6 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
ggtt->vm.cleanup(>vm);
 
mutex_unlock(>vm.mutex);
-   i915_address_space_fini(>vm);
 
arch_phys_wc_del(ggtt->mtrr);
 
@@ -709,6 +708,15 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
io_mapping_fini(>iomap);
 }
 
+void i915_ggtt_driver_remove(struct drm_i915_private *i915)
+{
+   struct i915_ggtt *ggtt = >ggtt;
+
+   fini_aliasing_ppgtt(ggtt);
+
+   ggtt_cleanup_hw(ggtt);
+}
+
 /**
  * i915_ggtt_driver_release - Clean up GGTT hardware initialization
  * @i915: i915 device
@@ -718,10 +726,7 @@ void i915_ggtt_driver_release(struct drm_i915_private 
*i915)
struct i915_ggtt *ggtt = >ggtt;
struct pagevec *pvec;
 
-   fini_aliasing_ppgtt(ggtt);
-
intel_ggtt_fini_fences(ggtt);
-   ggtt_cleanup_hw(ggtt);
 
pvec = >mm.wc_stash.pvec;
if (pvec->nr) {
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
b/drivers/gpu/drm/i915/gt/intel_gtt.h
index d93ebdf3fa0e..f140ce5c171a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -501,6 +501,7 @@ int i915_ggtt_enable_hw(struct drm_i915_private *i915);
 void i915_ggtt_enable_guc(struct