Re: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock
On Wed, Mar 17, 2021 at 10:54:17PM -0400, Waiman Long wrote: > On 3/17/21 10:24 PM, Boqun Feng wrote: > > Hi Waiman, > > > > Just a question out of curiosity: how does this problem hide so long? > > ;-) Because IIUC, both locktorture and ww_mutex_lock have been there for > > a while, so why didn't we spot this earlier? > > > > I ask just to make sure we don't introduce the problem because of some > > subtle problems in lock(dep). > > > You have to explicitly specify ww_mutex in the locktorture module parameter > to run the test. ww_mutex is usually not the intended target of testing as > there aren't that many places that use it. Even if someone run it, it > probably is not on a debug kernel. > > Our QA people try to run locktorture on ww_mutex and discover that. > Got it. Thanks ;-) Regards, Boqun > Cheers, > Longman >
Re: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock
On 3/17/21 10:24 PM, Boqun Feng wrote: Hi Waiman, Just a question out of curiosity: how does this problem hide so long? ;-) Because IIUC, both locktorture and ww_mutex_lock have been there for a while, so why didn't we spot this earlier? I ask just to make sure we don't introduce the problem because of some subtle problems in lock(dep). You have to explicitly specify ww_mutex in the locktorture module parameter to run the test. ww_mutex is usually not the intended target of testing as there aren't that many places that use it. Even if someone run it, it probably is not on a debug kernel. Our QA people try to run locktorture on ww_mutex and discover that. Cheers, Longman
Re: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock
Hi Waiman, Just a question out of curiosity: how does this problem hide so long? ;-) Because IIUC, both locktorture and ww_mutex_lock have been there for a while, so why didn't we spot this earlier? I ask just to make sure we don't introduce the problem because of some subtle problems in lock(dep). Regards, Boqun On Tue, Mar 16, 2021 at 11:31:18AM -0400, Waiman Long wrote: > It was found that running the ww_mutex_lock-torture test produced the > following lockdep splat almost immediately: > > [ 103.892638] == > [ 103.892639] WARNING: possible circular locking dependency detected > [ 103.892641] 5.12.0-rc3-debug+ #2 Tainted: G S W > [ 103.892643] -- > [ 103.892643] lock_torture_wr/3234 is trying to acquire lock: > [ 103.892646] c0b35b10 (torture_ww_mutex_2.base){+.+.}-{3:3}, at: > torture_ww_mutex_lock+0x316/0x720 [locktorture] > [ 103.892660] > [ 103.892660] but task is already holding lock: > [ 103.892661] c0b35cd0 (torture_ww_mutex_0.base){+.+.}-{3:3}, at: > torture_ww_mutex_lock+0x3e2/0x720 [locktorture] > [ 103.892669] > [ 103.892669] which lock already depends on the new lock. > [ 103.892669] > [ 103.892670] > [ 103.892670] the existing dependency chain (in reverse order) is: > [ 103.892671] > [ 103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}: > [ 103.892675]lock_acquire+0x1c5/0x830 > [ 103.892682]__ww_mutex_lock.constprop.15+0x1d1/0x2e50 > [ 103.892687]ww_mutex_lock+0x4b/0x180 > [ 103.892690]torture_ww_mutex_lock+0x316/0x720 [locktorture] > [ 103.892694]lock_torture_writer+0x142/0x3a0 [locktorture] > [ 103.892698]kthread+0x35f/0x430 > [ 103.892701]ret_from_fork+0x1f/0x30 > [ 103.892706] > [ 103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}: > [ 103.892709]lock_acquire+0x1c5/0x830 > [ 103.892712]__ww_mutex_lock.constprop.15+0x1d1/0x2e50 > [ 103.892715]ww_mutex_lock+0x4b/0x180 > [ 103.892717]torture_ww_mutex_lock+0x316/0x720 [locktorture] > [ 103.892721]lock_torture_writer+0x142/0x3a0 [locktorture] > [ 103.892725]kthread+0x35f/0x430 > [ 103.892727]ret_from_fork+0x1f/0x30 > [ 103.892730] > [ 103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}: > [ 103.892733]check_prevs_add+0x3fd/0x2470 > [ 103.892736]__lock_acquire+0x2602/0x3100 > [ 103.892738]lock_acquire+0x1c5/0x830 > [ 103.892740]__ww_mutex_lock.constprop.15+0x1d1/0x2e50 > [ 103.892743]ww_mutex_lock+0x4b/0x180 > [ 103.892746]torture_ww_mutex_lock+0x316/0x720 [locktorture] > [ 103.892749]lock_torture_writer+0x142/0x3a0 [locktorture] > [ 103.892753]kthread+0x35f/0x430 > [ 103.892755]ret_from_fork+0x1f/0x30 > [ 103.892757] > [ 103.892757] other info that might help us debug this: > [ 103.892757] > [ 103.892758] Chain exists of: > [ 103.892758] torture_ww_mutex_2.base --> torture_ww_mutex_1.base --> > torture_ww_mutex_0.base > [ 103.892758] > [ 103.892763] Possible unsafe locking scenario: > [ 103.892763] > [ 103.892764]CPU0CPU1 > [ 103.892765] > [ 103.892765] lock(torture_ww_mutex_0.base); > [ 103.892767] > lock(torture_ww_mutex_1.base); > [ 103.892770] > lock(torture_ww_mutex_0.base); > [ 103.892772] lock(torture_ww_mutex_2.base); > [ 103.892774] > [ 103.892774] *** DEADLOCK *** > > Since ww_mutex is supposed to be deadlock-proof if used properly, such > deadlock scenario should not happen. To avoid this false positive splat, > treat ww_mutex_lock() like a trylock(). > > After applying this patch, the locktorture test can run for a long time > without triggering the circular locking dependency splat. > > Signed-off-by: Waiman Long > --- > kernel/locking/mutex.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 622ebdfcd083..bb89393cd3a2 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -946,7 +946,10 @@ __mutex_lock_common(struct mutex *lock, long state, > unsigned int subclass, > } > > preempt_disable(); > - mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip); > + /* > + * Treat as trylock for ww_mutex. > + */ > + mutex_acquire_nest(&lock->dep_map, subclass, !!ww_ctx, nest_lock, ip); > > if (__mutex_trylock(lock) || > mutex_optimistic_spin(lock, ww_ctx, NULL)) { > -- > 2.18.1 >
Re: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock
On Tue, 16 Mar 2021, Waiman Long wrote: It was found that running the ww_mutex_lock-torture test produced the following lockdep splat almost immediately: [ 103.892638] == [ 103.892639] WARNING: possible circular locking dependency detected [ 103.892641] 5.12.0-rc3-debug+ #2 Tainted: G S W [ 103.892643] -- [ 103.892643] lock_torture_wr/3234 is trying to acquire lock: [ 103.892646] c0b35b10 (torture_ww_mutex_2.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x316/0x720 [locktorture] [ 103.892660] [ 103.892660] but task is already holding lock: [ 103.892661] c0b35cd0 (torture_ww_mutex_0.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x3e2/0x720 [locktorture] [ 103.892669] [ 103.892669] which lock already depends on the new lock. [ 103.892669] [ 103.892670] [ 103.892670] the existing dependency chain (in reverse order) is: [ 103.892671] [ 103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}: [ 103.892675]lock_acquire+0x1c5/0x830 [ 103.892682]__ww_mutex_lock.constprop.15+0x1d1/0x2e50 [ 103.892687]ww_mutex_lock+0x4b/0x180 [ 103.892690]torture_ww_mutex_lock+0x316/0x720 [locktorture] [ 103.892694]lock_torture_writer+0x142/0x3a0 [locktorture] [ 103.892698]kthread+0x35f/0x430 [ 103.892701]ret_from_fork+0x1f/0x30 [ 103.892706] [ 103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}: [ 103.892709]lock_acquire+0x1c5/0x830 [ 103.892712]__ww_mutex_lock.constprop.15+0x1d1/0x2e50 [ 103.892715]ww_mutex_lock+0x4b/0x180 [ 103.892717]torture_ww_mutex_lock+0x316/0x720 [locktorture] [ 103.892721]lock_torture_writer+0x142/0x3a0 [locktorture] [ 103.892725]kthread+0x35f/0x430 [ 103.892727]ret_from_fork+0x1f/0x30 [ 103.892730] [ 103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}: [ 103.892733]check_prevs_add+0x3fd/0x2470 [ 103.892736]__lock_acquire+0x2602/0x3100 [ 103.892738]lock_acquire+0x1c5/0x830 [ 103.892740]__ww_mutex_lock.constprop.15+0x1d1/0x2e50 [ 103.892743]ww_mutex_lock+0x4b/0x180 [ 103.892746]torture_ww_mutex_lock+0x316/0x720 [locktorture] [ 103.892749]lock_torture_writer+0x142/0x3a0 [locktorture] [ 103.892753]kthread+0x35f/0x430 [ 103.892755]ret_from_fork+0x1f/0x30 [ 103.892757] [ 103.892757] other info that might help us debug this: [ 103.892757] [ 103.892758] Chain exists of: [ 103.892758] torture_ww_mutex_2.base --> torture_ww_mutex_1.base --> torture_ww_mutex_0.base [ 103.892758] [ 103.892763] Possible unsafe locking scenario: [ 103.892763] [ 103.892764]CPU0CPU1 [ 103.892765] [ 103.892765] lock(torture_ww_mutex_0.base); [ 103.892767]lock(torture_ww_mutex_1.base); [ 103.892770]lock(torture_ww_mutex_0.base); [ 103.892772] lock(torture_ww_mutex_2.base); [ 103.892774] [ 103.892774] *** DEADLOCK *** Since ww_mutex is supposed to be deadlock-proof if used properly, such deadlock scenario should not happen. To avoid this false positive splat, treat ww_mutex_lock() like a trylock(). After applying this patch, the locktorture test can run for a long time without triggering the circular locking dependency splat. Signed-off-by: Waiman Long Acked-by Davidlohr Bueso
[PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock
It was found that running the ww_mutex_lock-torture test produced the following lockdep splat almost immediately: [ 103.892638] == [ 103.892639] WARNING: possible circular locking dependency detected [ 103.892641] 5.12.0-rc3-debug+ #2 Tainted: G S W [ 103.892643] -- [ 103.892643] lock_torture_wr/3234 is trying to acquire lock: [ 103.892646] c0b35b10 (torture_ww_mutex_2.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x316/0x720 [locktorture] [ 103.892660] [ 103.892660] but task is already holding lock: [ 103.892661] c0b35cd0 (torture_ww_mutex_0.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x3e2/0x720 [locktorture] [ 103.892669] [ 103.892669] which lock already depends on the new lock. [ 103.892669] [ 103.892670] [ 103.892670] the existing dependency chain (in reverse order) is: [ 103.892671] [ 103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}: [ 103.892675]lock_acquire+0x1c5/0x830 [ 103.892682]__ww_mutex_lock.constprop.15+0x1d1/0x2e50 [ 103.892687]ww_mutex_lock+0x4b/0x180 [ 103.892690]torture_ww_mutex_lock+0x316/0x720 [locktorture] [ 103.892694]lock_torture_writer+0x142/0x3a0 [locktorture] [ 103.892698]kthread+0x35f/0x430 [ 103.892701]ret_from_fork+0x1f/0x30 [ 103.892706] [ 103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}: [ 103.892709]lock_acquire+0x1c5/0x830 [ 103.892712]__ww_mutex_lock.constprop.15+0x1d1/0x2e50 [ 103.892715]ww_mutex_lock+0x4b/0x180 [ 103.892717]torture_ww_mutex_lock+0x316/0x720 [locktorture] [ 103.892721]lock_torture_writer+0x142/0x3a0 [locktorture] [ 103.892725]kthread+0x35f/0x430 [ 103.892727]ret_from_fork+0x1f/0x30 [ 103.892730] [ 103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}: [ 103.892733]check_prevs_add+0x3fd/0x2470 [ 103.892736]__lock_acquire+0x2602/0x3100 [ 103.892738]lock_acquire+0x1c5/0x830 [ 103.892740]__ww_mutex_lock.constprop.15+0x1d1/0x2e50 [ 103.892743]ww_mutex_lock+0x4b/0x180 [ 103.892746]torture_ww_mutex_lock+0x316/0x720 [locktorture] [ 103.892749]lock_torture_writer+0x142/0x3a0 [locktorture] [ 103.892753]kthread+0x35f/0x430 [ 103.892755]ret_from_fork+0x1f/0x30 [ 103.892757] [ 103.892757] other info that might help us debug this: [ 103.892757] [ 103.892758] Chain exists of: [ 103.892758] torture_ww_mutex_2.base --> torture_ww_mutex_1.base --> torture_ww_mutex_0.base [ 103.892758] [ 103.892763] Possible unsafe locking scenario: [ 103.892763] [ 103.892764]CPU0CPU1 [ 103.892765] [ 103.892765] lock(torture_ww_mutex_0.base); [ 103.892767]lock(torture_ww_mutex_1.base); [ 103.892770]lock(torture_ww_mutex_0.base); [ 103.892772] lock(torture_ww_mutex_2.base); [ 103.892774] [ 103.892774] *** DEADLOCK *** Since ww_mutex is supposed to be deadlock-proof if used properly, such deadlock scenario should not happen. To avoid this false positive splat, treat ww_mutex_lock() like a trylock(). After applying this patch, the locktorture test can run for a long time without triggering the circular locking dependency splat. Signed-off-by: Waiman Long --- kernel/locking/mutex.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 622ebdfcd083..bb89393cd3a2 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -946,7 +946,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, } preempt_disable(); - mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip); + /* +* Treat as trylock for ww_mutex. +*/ + mutex_acquire_nest(&lock->dep_map, subclass, !!ww_ctx, nest_lock, ip); if (__mutex_trylock(lock) || mutex_optimistic_spin(lock, ww_ctx, NULL)) { -- 2.18.1