Re: [Intel-gfx] [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv

2020-09-17 Thread Daniel Vetter
On Thu, Jul 30, 2020 at 06:45:14PM +0200, Thomas Hellström (Intel) wrote:
> 
> On 7/30/20 3:17 PM, Daniel Vetter wrote:
> > On Thu, Jul 30, 2020 at 2:17 PM Thomas Hellström (Intel)
> >  wrote:
> > > 
> > > On 7/28/20 3:58 PM, Daniel Vetter wrote:
> > > > GPU drivers need this in their shrinkers, to be able to throw out
> > > > mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers,
> > > > but that loop is resolved by trylocking in shrinkers.
> > > > 
> > > > So full hierarchy is now (ignore some of the other branches we already
> > > > have primed):
> > > > 
> > > > mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write
> > > > 
> > > > I hope that's not inconsistent with anything mm or fs does, adding
> > > > relevant people.
> > > > 
> > > Looks OK to me. The mapping_dirty_helpers run under the i_mmap_lock, but
> > > don't allocate any memory AFAICT.
> > > 
> > > Since huge page-table-entry splitting may happen under the i_mmap_lock
> > > from unmap_mapping_range() it might be worth figuring out how new page
> > > directory pages are allocated, though.
> > ofc I'm not an mm expert at all, but I did try to scroll through all
> > i_mmap_lock_write/read callers. Found the following:
> > 
> > - kernel/events/uprobes.c in build_map_info:
> > 
> >  /*
> >   * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion through
> >   * reclaim. This is optimistic, no harm done if it fails.
> >   */
> > 
> > - I got lost in the hugetlb.c code and couldn't convince myself it's
> > not allocating page directories at various levels with something else
> > than GFP_KERNEL.
> > 
> > So looks like the recursion is clearly there and known, but the
> > hugepage code is too complex and flying over my head.
> > -Daniel
> 
> OK, so I inverted your annotation and ran a memory hog, and got the below
> splat. So clearly your proposed reclaim->i_mmap_lock locking order is an
> already established one.
> 
> So
> 
> Reviewed-by: Thomas Hellström 

No one complaining that this is a terrible idea and two reviews from
people who know stuff, so I went ahead and pushed this to drm-misc-next.

Thanks for taking a look at this.
-Daniel

> 
> 8<-
> 
> [  308.324654] WARNING: possible circular locking dependency detected
> [  308.324655] 5.8.0-rc2+ #16 Not tainted
> [  308.324656] --
> [  308.324657] kswapd0/98 is trying to acquire lock:
> [  308.324658] 92a16f758428 (>i_mmap_rwsem){}-{3:3}, at:
> rmap_walk_file+0x1c0/0x2f0
> [  308.324663]
>    but task is already holding lock:
> [  308.324664] b0960240 (fs_reclaim){+.+.}-{0:0}, at:
> __fs_reclaim_acquire+0x5/0x30
> [  308.324666]
>    which lock already depends on the new lock.
> 
> [  308.324667]
>    the existing dependency chain (in reverse order) is:
> [  308.324667]
>    -> #1 (fs_reclaim){+.+.}-{0:0}:
> [  308.324670]    fs_reclaim_acquire+0x34/0x40
> [  308.324672]    dma_resv_lockdep+0x186/0x224
> [  308.324675]    do_one_initcall+0x5d/0x2c0
> [  308.324676]    kernel_init_freeable+0x222/0x288
> [  308.324678]    kernel_init+0xa/0x107
> [  308.324679]    ret_from_fork+0x1f/0x30
> [  308.324680]
>    -> #0 (>i_mmap_rwsem){}-{3:3}:
> [  308.324682]    __lock_acquire+0x119f/0x1fc0
> [  308.324683]    lock_acquire+0xa4/0x3b0
> [  308.324685]    down_read+0x2d/0x110
> [  308.324686]    rmap_walk_file+0x1c0/0x2f0
> [  308.324687]    page_referenced+0x133/0x150
> [  308.324689]    shrink_active_list+0x142/0x610
> [  308.324690]    balance_pgdat+0x229/0x620
> [  308.324691]    kswapd+0x200/0x470
> [  308.324693]    kthread+0x11f/0x140
> [  308.324694]    ret_from_fork+0x1f/0x30
> [  308.324694]
>    other info that might help us debug this:
> 
> [  308.324695]  Possible unsafe locking scenario:
> 
> [  308.324695]    CPU0    CPU1
> [  308.324696]        
> [  308.324696]   lock(fs_reclaim);
> [  308.324697] lock(>i_mmap_rwsem);
> [  308.324698]    lock(fs_reclaim);
> [  308.324699]   lock(>i_mmap_rwsem);
> [  308.324699]
>     *** DEADLOCK ***
> 
> [  308.324700] 1 lock held by kswapd0/98:
> [  308.324701]  #0: b0960240 (fs_reclaim){+.+.}-{0:0}, at:
> __fs_reclaim_acquire+0x5/0x30
> [  308.324702]
>    stack backtrace:
> [  308.324704] CPU: 1 PID: 98 Comm: kswapd0 Not tainted 5.8.0-rc2+ #16
> [  308.324705] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
> Desktop Reference Platform, BIOS 6.00 07/29/2019
> [  308.324706] Call Trace:
> [  308.324710]  dump_stack+0x92/0xc8
> [  308.324711]  check_noncircular+0x12d/0x150
> [  308.324713]  __lock_acquire+0x119f/0x1fc0
> [  308.324715]  lock_acquire+0xa4/0x3b0
> [  

Re: [Intel-gfx] [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv

2020-07-30 Thread Intel


On 7/30/20 3:17 PM, Daniel Vetter wrote:

On Thu, Jul 30, 2020 at 2:17 PM Thomas Hellström (Intel)
 wrote:


On 7/28/20 3:58 PM, Daniel Vetter wrote:

GPU drivers need this in their shrinkers, to be able to throw out
mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers,
but that loop is resolved by trylocking in shrinkers.

So full hierarchy is now (ignore some of the other branches we already
have primed):

mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write

I hope that's not inconsistent with anything mm or fs does, adding
relevant people.


Looks OK to me. The mapping_dirty_helpers run under the i_mmap_lock, but
don't allocate any memory AFAICT.

Since huge page-table-entry splitting may happen under the i_mmap_lock
from unmap_mapping_range() it might be worth figuring out how new page
directory pages are allocated, though.

ofc I'm not an mm expert at all, but I did try to scroll through all
i_mmap_lock_write/read callers. Found the following:

- kernel/events/uprobes.c in build_map_info:

 /*
  * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion through
  * reclaim. This is optimistic, no harm done if it fails.
  */

- I got lost in the hugetlb.c code and couldn't convince myself it's
not allocating page directories at various levels with something else
than GFP_KERNEL.

So looks like the recursion is clearly there and known, but the
hugepage code is too complex and flying over my head.
-Daniel


OK, so I inverted your annotation and ran a memory hog, and got the 
below splat. So clearly your proposed reclaim->i_mmap_lock locking order 
is an already established one.


So

Reviewed-by: Thomas Hellström 

8<-

[  308.324654] WARNING: possible circular locking dependency detected
[  308.324655] 5.8.0-rc2+ #16 Not tainted
[  308.324656] --
[  308.324657] kswapd0/98 is trying to acquire lock:
[  308.324658] 92a16f758428 (>i_mmap_rwsem){}-{3:3}, 
at: rmap_walk_file+0x1c0/0x2f0

[  308.324663]
   but task is already holding lock:
[  308.324664] b0960240 (fs_reclaim){+.+.}-{0:0}, at: 
__fs_reclaim_acquire+0x5/0x30

[  308.324666]
   which lock already depends on the new lock.

[  308.324667]
   the existing dependency chain (in reverse order) is:
[  308.324667]
   -> #1 (fs_reclaim){+.+.}-{0:0}:
[  308.324670]    fs_reclaim_acquire+0x34/0x40
[  308.324672]    dma_resv_lockdep+0x186/0x224
[  308.324675]    do_one_initcall+0x5d/0x2c0
[  308.324676]    kernel_init_freeable+0x222/0x288
[  308.324678]    kernel_init+0xa/0x107
[  308.324679]    ret_from_fork+0x1f/0x30
[  308.324680]
   -> #0 (>i_mmap_rwsem){}-{3:3}:
[  308.324682]    __lock_acquire+0x119f/0x1fc0
[  308.324683]    lock_acquire+0xa4/0x3b0
[  308.324685]    down_read+0x2d/0x110
[  308.324686]    rmap_walk_file+0x1c0/0x2f0
[  308.324687]    page_referenced+0x133/0x150
[  308.324689]    shrink_active_list+0x142/0x610
[  308.324690]    balance_pgdat+0x229/0x620
[  308.324691]    kswapd+0x200/0x470
[  308.324693]    kthread+0x11f/0x140
[  308.324694]    ret_from_fork+0x1f/0x30
[  308.324694]
   other info that might help us debug this:

[  308.324695]  Possible unsafe locking scenario:

[  308.324695]    CPU0    CPU1
[  308.324696]        
[  308.324696]   lock(fs_reclaim);
[  308.324697] lock(>i_mmap_rwsem);
[  308.324698]    lock(fs_reclaim);
[  308.324699]   lock(>i_mmap_rwsem);
[  308.324699]
    *** DEADLOCK ***

[  308.324700] 1 lock held by kswapd0/98:
[  308.324701]  #0: b0960240 (fs_reclaim){+.+.}-{0:0}, at: 
__fs_reclaim_acquire+0x5/0x30

[  308.324702]
   stack backtrace:
[  308.324704] CPU: 1 PID: 98 Comm: kswapd0 Not tainted 5.8.0-rc2+ #16
[  308.324705] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 07/29/2019

[  308.324706] Call Trace:
[  308.324710]  dump_stack+0x92/0xc8
[  308.324711]  check_noncircular+0x12d/0x150
[  308.324713]  __lock_acquire+0x119f/0x1fc0
[  308.324715]  lock_acquire+0xa4/0x3b0
[  308.324716]  ? rmap_walk_file+0x1c0/0x2f0
[  308.324717]  ? __lock_acquire+0x394/0x1fc0
[  308.324719]  down_read+0x2d/0x110
[  308.324720]  ? rmap_walk_file+0x1c0/0x2f0
[  308.324721]  rmap_walk_file+0x1c0/0x2f0
[  308.324722]  page_referenced+0x133/0x150
[  308.324724]  ? __page_set_anon_rmap+0x70/0x70
[  308.324725]  ? page_get_anon_vma+0x190/0x190
[  308.324726]  shrink_active_list+0x142/0x610
[  308.324728]  balance_pgdat+0x229/0x620
[  308.324730]  kswapd+0x200/0x470
[  308.324731]  ? lockdep_hardirqs_on_prepare+0xf5/0x170
[  308.324733]  ? finish_wait+0x80/0x80
[  308.324734]  ? balance_pgdat+0x620/0x620
[  

Re: [Intel-gfx] [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv

2020-07-30 Thread Daniel Vetter
On Thu, Jul 30, 2020 at 2:17 PM Thomas Hellström (Intel)
 wrote:
>
>
> On 7/28/20 3:58 PM, Daniel Vetter wrote:
> > GPU drivers need this in their shrinkers, to be able to throw out
> > mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers,
> > but that loop is resolved by trylocking in shrinkers.
> >
> > So full hierarchy is now (ignore some of the other branches we already
> > have primed):
> >
> > mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write
> >
> > I hope that's not inconsistent with anything mm or fs does, adding
> > relevant people.
> >
> Looks OK to me. The mapping_dirty_helpers run under the i_mmap_lock, but
> don't allocate any memory AFAICT.
>
> Since huge page-table-entry splitting may happen under the i_mmap_lock
> from unmap_mapping_range() it might be worth figuring out how new page
> directory pages are allocated, though.

ofc I'm not an mm expert at all, but I did try to scroll through all
i_mmap_lock_write/read callers. Found the following:

- kernel/events/uprobes.c in build_map_info:

/*
 * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion through
 * reclaim. This is optimistic, no harm done if it fails.
 */

- I got lost in the hugetlb.c code and couldn't convince myself it's
not allocating page directories at various levels with something else
than GFP_KERNEL.

So looks like the recursion is clearly there and known, but the
hugepage code is too complex and flying over my head.
-Daniel

>
> /Thomas
>
>
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv

2020-07-30 Thread Intel



On 7/28/20 3:58 PM, Daniel Vetter wrote:

GPU drivers need this in their shrinkers, to be able to throw out
mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers,
but that loop is resolved by trylocking in shrinkers.

So full hierarchy is now (ignore some of the other branches we already
have primed):

mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write

I hope that's not inconsistent with anything mm or fs does, adding
relevant people.

Looks OK to me. The mapping_dirty_helpers run under the i_mmap_lock, but 
don't allocate any memory AFAICT.


Since huge page-table-entry splitting may happen under the i_mmap_lock 
from unmap_mapping_range() it might be worth figuring out how new page 
directory pages are allocated, though.


/Thomas



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv

2020-07-28 Thread Daniel Vetter
GPU drivers need this in their shrinkers, to be able to throw out
mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers,
but that loop is resolved by trylocking in shrinkers.

So full hierarchy is now (ignore some of the other branches we already
have primed):

mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write

I hope that's not inconsistent with anything mm or fs does, adding
relevant people.

Signed-off-by: Daniel Vetter 
Cc: Sumit Semwal 
Cc: "Christian König" 
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Cc: Dave Chinner 
Cc: Qian Cai 
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: Thomas Hellström (Intel) 
Cc: Andrew Morton 
Cc: Jason Gunthorpe 
Cc: linux...@kvack.org
Cc: linux-r...@vger.kernel.org
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-resv.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 0e6675ec1d11..9678162a4ac5 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -104,12 +104,14 @@ static int __init dma_resv_lockdep(void)
struct mm_struct *mm = mm_alloc();
struct ww_acquire_ctx ctx;
struct dma_resv obj;
+   struct address_space mapping;
int ret;
 
if (!mm)
return -ENOMEM;
 
dma_resv_init();
+   address_space_init_once();
 
mmap_read_lock(mm);
ww_acquire_init(, _ww_class);
@@ -117,6 +119,9 @@ static int __init dma_resv_lockdep(void)
if (ret == -EDEADLK)
dma_resv_lock_slow(, );
fs_reclaim_acquire(GFP_KERNEL);
+   /* for unmap_mapping_range on trylocked buffer objects in shrinkers */
+   i_mmap_lock_write();
+   i_mmap_unlock_write();
 #ifdef CONFIG_MMU_NOTIFIER
lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
__dma_fence_might_wait();
-- 
2.27.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx