Re: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

2021-03-17 Thread Boqun Feng
On Wed, Mar 17, 2021 at 10:54:17PM -0400, Waiman Long wrote:
> On 3/17/21 10:24 PM, Boqun Feng wrote:
> > Hi Waiman,
> > 
> > Just a question out of curiosity: how does this problem hide so long?
> > ;-) Because IIUC, both locktorture and ww_mutex_lock have been there for
> > a while, so why didn't we spot this earlier?
> > 
> > I ask just to make sure we don't introduce the problem because of some
> > subtle problems in lock(dep).
> > 
> You have to explicitly specify ww_mutex in the locktorture module parameter
> to run the test. ww_mutex is usually not the intended target of testing as
> there aren't that many places that use it. Even if someone run it, it
> probably is not on a debug kernel.
> 
> Our QA people try to run locktorture on ww_mutex and discover that.
> 

Got it. Thanks ;-)

Regards,
Boqun

> Cheers,
> Longman
> 


Re: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

2021-03-17 Thread Waiman Long

On 3/17/21 10:24 PM, Boqun Feng wrote:

Hi Waiman,

Just a question out of curiosity: how does this problem hide so long?
;-) Because IIUC, both locktorture and ww_mutex_lock have been there for
a while, so why didn't we spot this earlier?

I ask just to make sure we don't introduce the problem because of some
subtle problems in lock(dep).

You have to explicitly specify ww_mutex in the locktorture module 
parameter to run the test. ww_mutex is usually not the intended target 
of testing as there aren't that many places that use it. Even if someone 
run it, it probably is not on a debug kernel.


Our QA people try to run locktorture on ww_mutex and discover that.

Cheers,
Longman



Re: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

2021-03-17 Thread Boqun Feng
Hi Waiman,

Just a question out of curiosity: how does this problem hide so long?
;-) Because IIUC, both locktorture and ww_mutex_lock have been there for
a while, so why didn't we spot this earlier?

I ask just to make sure we don't introduce the problem because of some
subtle problems in lock(dep).

Regards,
Boqun

On Tue, Mar 16, 2021 at 11:31:18AM -0400, Waiman Long wrote:
> It was found that running the ww_mutex_lock-torture test produced the
> following lockdep splat almost immediately:
> 
> [  103.892638] ==
> [  103.892639] WARNING: possible circular locking dependency detected
> [  103.892641] 5.12.0-rc3-debug+ #2 Tainted: G S  W
> [  103.892643] --
> [  103.892643] lock_torture_wr/3234 is trying to acquire lock:
> [  103.892646] c0b35b10 (torture_ww_mutex_2.base){+.+.}-{3:3}, at: 
> torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [  103.892660]
> [  103.892660] but task is already holding lock:
> [  103.892661] c0b35cd0 (torture_ww_mutex_0.base){+.+.}-{3:3}, at: 
> torture_ww_mutex_lock+0x3e2/0x720 [locktorture]
> [  103.892669]
> [  103.892669] which lock already depends on the new lock.
> [  103.892669]
> [  103.892670]
> [  103.892670] the existing dependency chain (in reverse order) is:
> [  103.892671]
> [  103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}:
> [  103.892675]lock_acquire+0x1c5/0x830
> [  103.892682]__ww_mutex_lock.constprop.15+0x1d1/0x2e50
> [  103.892687]ww_mutex_lock+0x4b/0x180
> [  103.892690]torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [  103.892694]lock_torture_writer+0x142/0x3a0 [locktorture]
> [  103.892698]kthread+0x35f/0x430
> [  103.892701]ret_from_fork+0x1f/0x30
> [  103.892706]
> [  103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}:
> [  103.892709]lock_acquire+0x1c5/0x830
> [  103.892712]__ww_mutex_lock.constprop.15+0x1d1/0x2e50
> [  103.892715]ww_mutex_lock+0x4b/0x180
> [  103.892717]torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [  103.892721]lock_torture_writer+0x142/0x3a0 [locktorture]
> [  103.892725]kthread+0x35f/0x430
> [  103.892727]ret_from_fork+0x1f/0x30
> [  103.892730]
> [  103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}:
> [  103.892733]check_prevs_add+0x3fd/0x2470
> [  103.892736]__lock_acquire+0x2602/0x3100
> [  103.892738]lock_acquire+0x1c5/0x830
> [  103.892740]__ww_mutex_lock.constprop.15+0x1d1/0x2e50
> [  103.892743]ww_mutex_lock+0x4b/0x180
> [  103.892746]torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [  103.892749]lock_torture_writer+0x142/0x3a0 [locktorture]
> [  103.892753]kthread+0x35f/0x430
> [  103.892755]ret_from_fork+0x1f/0x30
> [  103.892757]
> [  103.892757] other info that might help us debug this:
> [  103.892757]
> [  103.892758] Chain exists of:
> [  103.892758]   torture_ww_mutex_2.base --> torture_ww_mutex_1.base --> 
> torture_ww_mutex_0.base
> [  103.892758]
> [  103.892763]  Possible unsafe locking scenario:
> [  103.892763]
> [  103.892764]CPU0CPU1
> [  103.892765]
> [  103.892765]   lock(torture_ww_mutex_0.base);
> [  103.892767]  
> lock(torture_ww_mutex_1.base);
> [  103.892770]  
> lock(torture_ww_mutex_0.base);
> [  103.892772]   lock(torture_ww_mutex_2.base);
> [  103.892774]
> [  103.892774]  *** DEADLOCK ***
> 
> Since ww_mutex is supposed to be deadlock-proof if used properly, such
> deadlock scenario should not happen. To avoid this false positive splat,
> treat ww_mutex_lock() like a trylock().
> 
> After applying this patch, the locktorture test can run for a long time
> without triggering the circular locking dependency splat.
> 
> Signed-off-by: Waiman Long 
> ---
>  kernel/locking/mutex.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 622ebdfcd083..bb89393cd3a2 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -946,7 +946,10 @@ __mutex_lock_common(struct mutex *lock, long state, 
> unsigned int subclass,
>   }
>  
>   preempt_disable();
> - mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
> + /*
> +  * Treat as trylock for ww_mutex.
> +  */
> + mutex_acquire_nest(&lock->dep_map, subclass, !!ww_ctx, nest_lock, ip);
>  
>   if (__mutex_trylock(lock) ||
>   mutex_optimistic_spin(lock, ww_ctx, NULL)) {
> -- 
> 2.18.1
> 


Re: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

2021-03-16 Thread Davidlohr Bueso

On Tue, 16 Mar 2021, Waiman Long wrote:


It was found that running the ww_mutex_lock-torture test produced the
following lockdep splat almost immediately:

[  103.892638] ==
[  103.892639] WARNING: possible circular locking dependency detected
[  103.892641] 5.12.0-rc3-debug+ #2 Tainted: G S  W
[  103.892643] --
[  103.892643] lock_torture_wr/3234 is trying to acquire lock:
[  103.892646] c0b35b10 (torture_ww_mutex_2.base){+.+.}-{3:3}, at: 
torture_ww_mutex_lock+0x316/0x720 [locktorture]
[  103.892660]
[  103.892660] but task is already holding lock:
[  103.892661] c0b35cd0 (torture_ww_mutex_0.base){+.+.}-{3:3}, at: 
torture_ww_mutex_lock+0x3e2/0x720 [locktorture]
[  103.892669]
[  103.892669] which lock already depends on the new lock.
[  103.892669]
[  103.892670]
[  103.892670] the existing dependency chain (in reverse order) is:
[  103.892671]
[  103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}:
[  103.892675]lock_acquire+0x1c5/0x830
[  103.892682]__ww_mutex_lock.constprop.15+0x1d1/0x2e50
[  103.892687]ww_mutex_lock+0x4b/0x180
[  103.892690]torture_ww_mutex_lock+0x316/0x720 [locktorture]
[  103.892694]lock_torture_writer+0x142/0x3a0 [locktorture]
[  103.892698]kthread+0x35f/0x430
[  103.892701]ret_from_fork+0x1f/0x30
[  103.892706]
[  103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}:
[  103.892709]lock_acquire+0x1c5/0x830
[  103.892712]__ww_mutex_lock.constprop.15+0x1d1/0x2e50
[  103.892715]ww_mutex_lock+0x4b/0x180
[  103.892717]torture_ww_mutex_lock+0x316/0x720 [locktorture]
[  103.892721]lock_torture_writer+0x142/0x3a0 [locktorture]
[  103.892725]kthread+0x35f/0x430
[  103.892727]ret_from_fork+0x1f/0x30
[  103.892730]
[  103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}:
[  103.892733]check_prevs_add+0x3fd/0x2470
[  103.892736]__lock_acquire+0x2602/0x3100
[  103.892738]lock_acquire+0x1c5/0x830
[  103.892740]__ww_mutex_lock.constprop.15+0x1d1/0x2e50
[  103.892743]ww_mutex_lock+0x4b/0x180
[  103.892746]torture_ww_mutex_lock+0x316/0x720 [locktorture]
[  103.892749]lock_torture_writer+0x142/0x3a0 [locktorture]
[  103.892753]kthread+0x35f/0x430
[  103.892755]ret_from_fork+0x1f/0x30
[  103.892757]
[  103.892757] other info that might help us debug this:
[  103.892757]
[  103.892758] Chain exists of:
[  103.892758]   torture_ww_mutex_2.base --> torture_ww_mutex_1.base --> 
torture_ww_mutex_0.base
[  103.892758]
[  103.892763]  Possible unsafe locking scenario:
[  103.892763]
[  103.892764]CPU0CPU1
[  103.892765]
[  103.892765]   lock(torture_ww_mutex_0.base);
[  103.892767]lock(torture_ww_mutex_1.base);
[  103.892770]lock(torture_ww_mutex_0.base);
[  103.892772]   lock(torture_ww_mutex_2.base);
[  103.892774]
[  103.892774]  *** DEADLOCK ***

Since ww_mutex is supposed to be deadlock-proof if used properly, such
deadlock scenario should not happen. To avoid this false positive splat,
treat ww_mutex_lock() like a trylock().

After applying this patch, the locktorture test can run for a long time
without triggering the circular locking dependency splat.

Signed-off-by: Waiman Long 


Acked-by Davidlohr Bueso 


[PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

2021-03-16 Thread Waiman Long
It was found that running the ww_mutex_lock-torture test produced the
following lockdep splat almost immediately:

[  103.892638] ==
[  103.892639] WARNING: possible circular locking dependency detected
[  103.892641] 5.12.0-rc3-debug+ #2 Tainted: G S  W
[  103.892643] --
[  103.892643] lock_torture_wr/3234 is trying to acquire lock:
[  103.892646] c0b35b10 (torture_ww_mutex_2.base){+.+.}-{3:3}, at: 
torture_ww_mutex_lock+0x316/0x720 [locktorture]
[  103.892660]
[  103.892660] but task is already holding lock:
[  103.892661] c0b35cd0 (torture_ww_mutex_0.base){+.+.}-{3:3}, at: 
torture_ww_mutex_lock+0x3e2/0x720 [locktorture]
[  103.892669]
[  103.892669] which lock already depends on the new lock.
[  103.892669]
[  103.892670]
[  103.892670] the existing dependency chain (in reverse order) is:
[  103.892671]
[  103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}:
[  103.892675]lock_acquire+0x1c5/0x830
[  103.892682]__ww_mutex_lock.constprop.15+0x1d1/0x2e50
[  103.892687]ww_mutex_lock+0x4b/0x180
[  103.892690]torture_ww_mutex_lock+0x316/0x720 [locktorture]
[  103.892694]lock_torture_writer+0x142/0x3a0 [locktorture]
[  103.892698]kthread+0x35f/0x430
[  103.892701]ret_from_fork+0x1f/0x30
[  103.892706]
[  103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}:
[  103.892709]lock_acquire+0x1c5/0x830
[  103.892712]__ww_mutex_lock.constprop.15+0x1d1/0x2e50
[  103.892715]ww_mutex_lock+0x4b/0x180
[  103.892717]torture_ww_mutex_lock+0x316/0x720 [locktorture]
[  103.892721]lock_torture_writer+0x142/0x3a0 [locktorture]
[  103.892725]kthread+0x35f/0x430
[  103.892727]ret_from_fork+0x1f/0x30
[  103.892730]
[  103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}:
[  103.892733]check_prevs_add+0x3fd/0x2470
[  103.892736]__lock_acquire+0x2602/0x3100
[  103.892738]lock_acquire+0x1c5/0x830
[  103.892740]__ww_mutex_lock.constprop.15+0x1d1/0x2e50
[  103.892743]ww_mutex_lock+0x4b/0x180
[  103.892746]torture_ww_mutex_lock+0x316/0x720 [locktorture]
[  103.892749]lock_torture_writer+0x142/0x3a0 [locktorture]
[  103.892753]kthread+0x35f/0x430
[  103.892755]ret_from_fork+0x1f/0x30
[  103.892757]
[  103.892757] other info that might help us debug this:
[  103.892757]
[  103.892758] Chain exists of:
[  103.892758]   torture_ww_mutex_2.base --> torture_ww_mutex_1.base --> 
torture_ww_mutex_0.base
[  103.892758]
[  103.892763]  Possible unsafe locking scenario:
[  103.892763]
[  103.892764]CPU0CPU1
[  103.892765]
[  103.892765]   lock(torture_ww_mutex_0.base);
[  103.892767]lock(torture_ww_mutex_1.base);
[  103.892770]lock(torture_ww_mutex_0.base);
[  103.892772]   lock(torture_ww_mutex_2.base);
[  103.892774]
[  103.892774]  *** DEADLOCK ***

Since ww_mutex is supposed to be deadlock-proof if used properly, such
deadlock scenario should not happen. To avoid this false positive splat,
treat ww_mutex_lock() like a trylock().

After applying this patch, the locktorture test can run for a long time
without triggering the circular locking dependency splat.

Signed-off-by: Waiman Long 
---
 kernel/locking/mutex.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 622ebdfcd083..bb89393cd3a2 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -946,7 +946,10 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
}
 
preempt_disable();
-   mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
+   /*
+* Treat as trylock for ww_mutex.
+*/
+   mutex_acquire_nest(&lock->dep_map, subclass, !!ww_ctx, nest_lock, ip);
 
if (__mutex_trylock(lock) ||
mutex_optimistic_spin(lock, ww_ctx, NULL)) {
-- 
2.18.1