Re: [locking/ww_mutex] 2a0c112828 WARNING: CPU: 0 PID: 18 at kernel/locking/mutex.c:305 __ww_mutex_wakeup_for_backoff
On Wed, Mar 01, 2017 at 04:26:48PM +, Chris Wilson wrote: > On Wed, Mar 01, 2017 at 04:11:48PM +, Chris Wilson wrote: > > On Wed, Mar 01, 2017 at 04:54:14PM +0100, Peter Zijlstra wrote: > > > On Wed, Mar 01, 2017 at 11:40:43PM +0800, Fengguang Wu wrote: > > > > Thanks for the patch! I applied the patch on top of "locking/ww_mutex: > > > > Add kselftests for ww_mutex stress", and find no "bad unlock balance > > > > detected" but this warning. Attached is the new dmesg which is a bit > > > > large due to lots of repeated errors. > > > > > > So with all the various patches it works for me. > > > > > > I also have the following on top; which I did when I was looking through > > > this code trying to figure out wth was happening. > > > > > > Chris, does this make sense to you? > > > > > > It makes each loop a fully new 'instance', otherwise we'll never update > > > the ww_class->stamp and the threads will aways have the same order. > > > > Sounds ok, I just thought the stamp order of the threads was > > immaterial - with each test doing a different sequence of locks and each > > being identical in behaviour, it would not matter which had priority, > > there would have be some shuffling no matter waht. However, for the > > purpose of testing, having each iteration be a new locking instance does > > make it behaviour more like a typical user. > > Correcting myself, the workers didn't reorder the locks, so changing the > stamp does make the test more interesting. OK, so I'll go write a Changelog for it then ;-) And stick your ACK on.
Re: [locking/ww_mutex] 2a0c112828 WARNING: CPU: 0 PID: 18 at kernel/locking/mutex.c:305 __ww_mutex_wakeup_for_backoff
On Wed, Mar 01, 2017 at 05:54:06PM +0100, Peter Zijlstra wrote: On Wed, Mar 01, 2017 at 04:26:48PM +, Chris Wilson wrote: On Wed, Mar 01, 2017 at 04:11:48PM +, Chris Wilson wrote: > On Wed, Mar 01, 2017 at 04:54:14PM +0100, Peter Zijlstra wrote: > > On Wed, Mar 01, 2017 at 11:40:43PM +0800, Fengguang Wu wrote: > > > Thanks for the patch! I applied the patch on top of "locking/ww_mutex: > > > Add kselftests for ww_mutex stress", and find no "bad unlock balance > > > detected" but this warning. Attached is the new dmesg which is a bit > > > large due to lots of repeated errors. > > > > So with all the various patches it works for me. > > > > I also have the following on top; which I did when I was looking through > > this code trying to figure out wth was happening. > > > > Chris, does this make sense to you? > > > > It makes each loop a fully new 'instance', otherwise we'll never update > > the ww_class->stamp and the threads will aways have the same order. > > Sounds ok, I just thought the stamp order of the threads was > immaterial - with each test doing a different sequence of locks and each > being identical in behaviour, it would not matter which had priority, > there would have be some shuffling no matter waht. However, for the > purpose of testing, having each iteration be a new locking instance does > make it behaviour more like a typical user. Correcting myself, the workers didn't reorder the locks, so changing the stamp does make the test more interesting. OK, so I'll go write a Changelog for it then ;-) And stick your ACK on. With both patches in this thread, all 110 boots are successful w/o a single warning. Tested-by: Fengguang Wu Thanks, Fengguang
Re: [locking/ww_mutex] 2a0c112828 WARNING: CPU: 0 PID: 18 at kernel/locking/mutex.c:305 __ww_mutex_wakeup_for_backoff
On Wed, Mar 01, 2017 at 04:11:48PM +, Chris Wilson wrote: > On Wed, Mar 01, 2017 at 04:54:14PM +0100, Peter Zijlstra wrote: > > On Wed, Mar 01, 2017 at 11:40:43PM +0800, Fengguang Wu wrote: > > > Thanks for the patch! I applied the patch on top of "locking/ww_mutex: > > > Add kselftests for ww_mutex stress", and find no "bad unlock balance > > > detected" but this warning. Attached is the new dmesg which is a bit > > > large due to lots of repeated errors. > > > > So with all the various patches it works for me. > > > > I also have the following on top; which I did when I was looking through > > this code trying to figure out wth was happening. > > > > Chris, does this make sense to you? > > > > It makes each loop a fully new 'instance', otherwise we'll never update > > the ww_class->stamp and the threads will aways have the same order. > > Sounds ok, I just thought the stamp order of the threads was > immaterial - with each test doing a different sequence of locks and each > being identical in behaviour, it would not matter which had priority, > there would have be some shuffling no matter waht. However, for the > purpose of testing, having each iteration be a new locking instance does > make it behaviour more like a typical user. Correcting myself, the workers didn't reorder the locks, so changing the stamp does make the test more interesting. -Chris -- Chris Wilson, Intel Open Source Technology Centre
Re: [locking/ww_mutex] 2a0c112828 WARNING: CPU: 0 PID: 18 at kernel/locking/mutex.c:305 __ww_mutex_wakeup_for_backoff
On Wed, Mar 01, 2017 at 04:54:14PM +0100, Peter Zijlstra wrote: > On Wed, Mar 01, 2017 at 11:40:43PM +0800, Fengguang Wu wrote: > > Thanks for the patch! I applied the patch on top of "locking/ww_mutex: > > Add kselftests for ww_mutex stress", and find no "bad unlock balance > > detected" but this warning. Attached is the new dmesg which is a bit > > large due to lots of repeated errors. > > So with all the various patches it works for me. > > I also have the following on top; which I did when I was looking through > this code trying to figure out wth was happening. > > Chris, does this make sense to you? > > It makes each loop a fully new 'instance', otherwise we'll never update > the ww_class->stamp and the threads will aways have the same order. Sounds ok, I just thought the stamp order of the threads was immaterial - with each test doing a different sequence of locks and each being identical in behaviour, it would not matter which had priority, there would have be some shuffling no matter waht. However, for the purpose of testing, having each iteration be a new locking instance does make it behaviour more like a typical user. -Chris -- Chris Wilson, Intel Open Source Technology Centre
Re: [locking/ww_mutex] 2a0c112828 WARNING: CPU: 0 PID: 18 at kernel/locking/mutex.c:305 __ww_mutex_wakeup_for_backoff
On Wed, Mar 01, 2017 at 04:51:36PM +0100, Peter Zijlstra wrote: On Wed, Mar 01, 2017 at 11:40:43PM +0800, Fengguang Wu wrote: Thanks for the patch! I applied the patch on top of "locking/ww_mutex: Add kselftests for ww_mutex stress", and find no "bad unlock balance detected" but this warning. Attached is the new dmesg which is a bit large due to lots of repeated errors. [9.105427] Freeing initrd memory: 24852K [9.121306] The force parameter has not been set to 1. The Iris poweroff handler will not be installed. [9.141216] NatSemi SCx200 Driver [9.724519] [ cut here ] [9.726795] WARNING: CPU: 0 PID: 22 at kernel/locking/mutex.c:305 __ww_mutex_wakeup_for_backoff+0x31/0x7a [9.738281] CPU: 0 PID: 22 Comm: kworker/u2:1 Not tainted 4.10.0-rc3-00156-g7d81c50 #1 [9.741977] Workqueue: test-ww_mutex test_cycle_work [9.745524] Call Trace: [9.747610] dump_stack+0x16/0x18 [9.754619] __warn+0xa0/0xb7 [9.757553] ? __ww_mutex_wakeup_for_backoff+0x31/0x7a [9.760881] warn_slowpath_null+0x11/0x16 [9.765222] __ww_mutex_wakeup_for_backoff+0x31/0x7a [9.768028] __ww_mutex_lock+0x2f3/0xb63 [9.770979] ? wake_up_q+0x25/0x40 [9.773044] ? __might_sleep+0x6c/0x73 [9.774890] ww_mutex_lock+0x34/0x3b [9.776001] ? test_cycle_work+0xf7/0x170 [9.51] test_cycle_work+0xf7/0x170 [9.779036] process_one_work+0x1c0/0x33a [9.780664] ? process_one_work+0x168/0x33a [9.788924] worker_thread+0x22f/0x315 [9.791016] kthread+0xed/0xf2 [9.793255] ? process_scheduled_works+0x24/0x24 [9.795475] ? __kthread_create_on_node+0x11f/0x11f [9.798741] ? __kthread_create_on_node+0x11f/0x11f [9.802371] ret_from_fork+0x19/0x30 [9.804430] ---[ end trace 9036bbb174aed804 ]--- Do you have the below patch in? Nope. I'll re-test with it added. Regards, Fengguang --- commit b9c16a0e1f733c97e48798b2a9362c485bb3b731 Author: Peter Zijlstra Date: Tue Jan 17 16:06:09 2017 +0100 locking/mutex: Fix lockdep_assert_held() fail In commit: 659cf9f5824a ("locking/ww_mutex: Optimize ww-mutexes by waking at most one waiter for backoff when acquiring the lock") I replaced a comment with a lockdep_assert_held(). However it turns out we hide that lock from lockdep for hysterical raisins, which results in the assertion always firing. Remove the old debug code as lockdep will easily spot the abuse it was meant to catch, which will make the lock visible to lockdep and make the assertion work as intended. Reported-by: Mike Galbraith Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Nicolai Haehnle Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: 659cf9f5824a ("locking/ww_mutex: Optimize ww-mutexes by waking at most one waiter for backoff when acquiring the lock") Link: http://lkml.kernel.org/r/20170117150609.GB32474@worktop Signed-off-by: Ingo Molnar diff --git a/kernel/locking/mutex-debug.h b/kernel/locking/mutex-debug.h index a459faa48987..4174417d5309 100644 --- a/kernel/locking/mutex-debug.h +++ b/kernel/locking/mutex-debug.h @@ -26,20 +26,3 @@ extern void mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter, extern void debug_mutex_unlock(struct mutex *lock); extern void debug_mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key); - -#define spin_lock_mutex(lock, flags) \ - do {\ - struct mutex *l = container_of(lock, struct mutex, wait_lock); \ - \ - DEBUG_LOCKS_WARN_ON(in_interrupt());\ - local_irq_save(flags); \ - arch_spin_lock(&(lock)->rlock.raw_lock);\ - DEBUG_LOCKS_WARN_ON(l->magic != l); \ - } while (0) - -#define spin_unlock_mutex(lock, flags) \ - do {\ - arch_spin_unlock(&(lock)->rlock.raw_lock); \ - local_irq_restore(flags); \ - preempt_check_resched();\ - } while (0) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 935116723a3d..705e06fe5e6c 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -325,8 +325,6 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx) static __always_inline void ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) { - unsigned long flags; - ww_mutex_lock_acquired(lock, ctx); lock->ctx = ctx; @@ -350,9 +348,9 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) * Uh oh, we raced in fastpath, wake up everyone in this case, * so they can see the new lock->ctx. */ - s
Re: [locking/ww_mutex] 2a0c112828 WARNING: CPU: 0 PID: 18 at kernel/locking/mutex.c:305 __ww_mutex_wakeup_for_backoff
On Wed, Mar 01, 2017 at 11:40:43PM +0800, Fengguang Wu wrote: > Thanks for the patch! I applied the patch on top of "locking/ww_mutex: > Add kselftests for ww_mutex stress", and find no "bad unlock balance > detected" but this warning. Attached is the new dmesg which is a bit > large due to lots of repeated errors. So with all the various patches it works for me. I also have the following on top; which I did when I was looking through this code trying to figure out wth was happening. Chris, does this make sense to you? It makes each loop a fully new 'instance', otherwise we'll never update the ww_class->stamp and the threads will aways have the same order. --- diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c index da6c9a34f62f..d0fd06429c9d 100644 --- a/kernel/locking/test-ww_mutex.c +++ b/kernel/locking/test-ww_mutex.c @@ -398,12 +398,11 @@ static void stress_inorder_work(struct work_struct *work) if (!order) return; - ww_acquire_init(&ctx, &ww_class); - do { int contended = -1; int n, err; + ww_acquire_init(&ctx, &ww_class); retry: err = 0; for (n = 0; n < nlocks; n++) { @@ -433,9 +432,9 @@ static void stress_inorder_work(struct work_struct *work) __func__, err); break; } - } while (--stress->nloops); - ww_acquire_fini(&ctx); + ww_acquire_fini(&ctx); + } while (--stress->nloops); kfree(order); kfree(stress); @@ -470,9 +469,9 @@ static void stress_reorder_work(struct work_struct *work) kfree(order); order = NULL; - ww_acquire_init(&ctx, &ww_class); - do { + ww_acquire_init(&ctx, &ww_class); + list_for_each_entry(ll, &locks, link) { err = ww_mutex_lock(ll->lock, &ctx); if (!err) @@ -495,9 +494,9 @@ static void stress_reorder_work(struct work_struct *work) dummy_load(stress); list_for_each_entry(ll, &locks, link) ww_mutex_unlock(ll->lock); - } while (--stress->nloops); - ww_acquire_fini(&ctx); + ww_acquire_fini(&ctx); + } while (--stress->nloops); out: list_for_each_entry_safe(ll, ln, &locks, link)
Re: [locking/ww_mutex] 2a0c112828 WARNING: CPU: 0 PID: 18 at kernel/locking/mutex.c:305 __ww_mutex_wakeup_for_backoff
On Wed, Mar 01, 2017 at 11:40:43PM +0800, Fengguang Wu wrote: > Thanks for the patch! I applied the patch on top of "locking/ww_mutex: > Add kselftests for ww_mutex stress", and find no "bad unlock balance > detected" but this warning. Attached is the new dmesg which is a bit > large due to lots of repeated errors. > > [9.105427] Freeing initrd memory: 24852K > [9.121306] The force parameter has not been set to 1. The Iris poweroff > handler will not be installed. > [9.141216] NatSemi SCx200 Driver > [9.724519] [ cut here ] > [9.726795] WARNING: CPU: 0 PID: 22 at kernel/locking/mutex.c:305 > __ww_mutex_wakeup_for_backoff+0x31/0x7a > [9.738281] CPU: 0 PID: 22 Comm: kworker/u2:1 Not tainted > 4.10.0-rc3-00156-g7d81c50 #1 > [9.741977] Workqueue: test-ww_mutex test_cycle_work > [9.745524] Call Trace: > [9.747610] dump_stack+0x16/0x18 > [9.754619] __warn+0xa0/0xb7 > [9.757553] ? __ww_mutex_wakeup_for_backoff+0x31/0x7a > [9.760881] warn_slowpath_null+0x11/0x16 > [9.765222] __ww_mutex_wakeup_for_backoff+0x31/0x7a > [9.768028] __ww_mutex_lock+0x2f3/0xb63 > [9.770979] ? wake_up_q+0x25/0x40 > [9.773044] ? __might_sleep+0x6c/0x73 > [9.774890] ww_mutex_lock+0x34/0x3b > [9.776001] ? test_cycle_work+0xf7/0x170 > [9.51] test_cycle_work+0xf7/0x170 > [9.779036] process_one_work+0x1c0/0x33a > [9.780664] ? process_one_work+0x168/0x33a > [9.788924] worker_thread+0x22f/0x315 > [9.791016] kthread+0xed/0xf2 > [9.793255] ? process_scheduled_works+0x24/0x24 > [9.795475] ? __kthread_create_on_node+0x11f/0x11f > [9.798741] ? __kthread_create_on_node+0x11f/0x11f > [9.802371] ret_from_fork+0x19/0x30 > [9.804430] ---[ end trace 9036bbb174aed804 ]--- Do you have the below patch in? --- commit b9c16a0e1f733c97e48798b2a9362c485bb3b731 Author: Peter Zijlstra Date: Tue Jan 17 16:06:09 2017 +0100 locking/mutex: Fix lockdep_assert_held() fail In commit: 659cf9f5824a ("locking/ww_mutex: Optimize ww-mutexes by waking at most one waiter for backoff when acquiring the lock") I replaced a comment with a lockdep_assert_held(). However it turns out we hide that lock from lockdep for hysterical raisins, which results in the assertion always firing. Remove the old debug code as lockdep will easily spot the abuse it was meant to catch, which will make the lock visible to lockdep and make the assertion work as intended. Reported-by: Mike Galbraith Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Nicolai Haehnle Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: 659cf9f5824a ("locking/ww_mutex: Optimize ww-mutexes by waking at most one waiter for backoff when acquiring the lock") Link: http://lkml.kernel.org/r/20170117150609.GB32474@worktop Signed-off-by: Ingo Molnar diff --git a/kernel/locking/mutex-debug.h b/kernel/locking/mutex-debug.h index a459faa48987..4174417d5309 100644 --- a/kernel/locking/mutex-debug.h +++ b/kernel/locking/mutex-debug.h @@ -26,20 +26,3 @@ extern void mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter, extern void debug_mutex_unlock(struct mutex *lock); extern void debug_mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key); - -#define spin_lock_mutex(lock, flags) \ - do {\ - struct mutex *l = container_of(lock, struct mutex, wait_lock); \ - \ - DEBUG_LOCKS_WARN_ON(in_interrupt());\ - local_irq_save(flags); \ - arch_spin_lock(&(lock)->rlock.raw_lock);\ - DEBUG_LOCKS_WARN_ON(l->magic != l); \ - } while (0) - -#define spin_unlock_mutex(lock, flags) \ - do {\ - arch_spin_unlock(&(lock)->rlock.raw_lock); \ - local_irq_restore(flags); \ - preempt_check_resched();\ - } while (0) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 935116723a3d..705e06fe5e6c 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -325,8 +325,6 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx) static __always_inline void ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) { - unsigned long flags; - ww_mutex_lock_acquired(lock, ctx); lock->ctx = ctx; @@ -350,9 +348,9 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) * Uh oh, we raced in fastpath, wake up everyone in this case, * so they can see the new lock->ctx. */ - spin_lo
Re: [locking/ww_mutex] 2a0c112828 WARNING: CPU: 0 PID: 18 at kernel/locking/mutex.c:305 __ww_mutex_wakeup_for_backoff
On Wed, Mar 01, 2017 at 11:01:38PM +0800, Boqun Feng wrote: > > In test-ww_mutex, the stress will use 4096 locks to do the test, and > given held_lock::references only have 12 bits, so I think this is the > reason? I made a patch to reduce the lock number of stress test, and it > seems the problem is resolved. > > Thoughts? Aaaahhh!! YES. I completely overlooked (and forgot) ww_mutex used that. Let me also do a lockdep patch that complains on overflow there. [ cut here ] WARNING: CPU: 0 PID: 5 at ../kernel/locking/lockdep.c:3269 __lock_acquire.isra.7+0xf8/0x703 DEBUG_LOCKS_WARN_ON(hlock->references == (1 << 12)-1) CPU: 0 PID: 5 Comm: kworker/u2:0 Not tainted 4.10.0-01418-gfba4873-dirty #29 Workqueue: test-ww_mutex stress_inorder_work Call Trace: dump_stack+0x16/0x18 __warn+0xa0/0xb7 ? __lock_acquire.isra.7+0xf8/0x703 warn_slowpath_fmt+0x28/0x2d __lock_acquire.isra.7+0xf8/0x703 lock_acquire+0x6c/0x95 ? stress_inorder_work+0xbf/0x218 __ww_mutex_lock.constprop.0+0x5c/0x848 ? stress_inorder_work+0xbf/0x218 ? __might_sleep+0x6c/0x73 ww_mutex_lock+0x34/0x3b ? stress_inorder_work+0xbf/0x218 stress_inorder_work+0xbf/0x218 process_one_work+0x1c0/0x33a ? process_one_work+0x168/0x33a worker_thread+0x22f/0x315 kthread+0xed/0xf2 ? process_scheduled_works+0x24/0x24 ? __kthread_create_on_node+0x11f/0x11f ret_from_fork+0x21/0x30 ---[ end trace 326b6bfcd7100ae8 ]--- --- kernel/locking/lockdep.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 9812e5dd409e..c0ee8607c11e 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -3260,10 +3260,17 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, if (depth) { hlock = curr->held_locks + depth - 1; if (hlock->class_idx == class_idx && nest_lock) { - if (hlock->references) + if (hlock->references) { + /* +* Check: unsigned int references:12, overflow. +*/ + if (DEBUG_LOCKS_WARN_ON(hlock->references == (1 << 12)-1)) + return 0; + hlock->references++; - else + } else { hlock->references = 2; + } return 1; }
Re: [locking/ww_mutex] 2a0c112828 WARNING: CPU: 0 PID: 18 at kernel/locking/mutex.c:305 __ww_mutex_wakeup_for_backoff
On Mon, Feb 27, 2017 at 11:35:43AM +0100, Peter Zijlstra wrote: > On Mon, Feb 27, 2017 at 11:28:24AM +0100, Peter Zijlstra wrote: > > On Mon, Feb 27, 2017 at 01:14:09PM +0800, Fengguang Wu wrote: > > > Hello, > > > > > > This bisect result is not satisfactory, however the bug is very > > > reproducible and looks still alive in mainline&linux-next. You may > > > try the attached reproduce-* script to debug it. > > > > OK, let me try that, however, see below. > > What that reproduces does is: > > > [ 17.169056] = > [ 17.171014] [ BUG: bad unlock balance detected! ] > [ 17.172115] 4.10.0-10265-ge5d56ef #18 Not tainted > [ 17.172115] - > [ 17.172115] kworker/u2:0/5 is trying to release lock (ww_class_mutex) at: > [ 17.172115] [<41575474>] ww_mutex_unlock+0x66/0x72 > [ 17.172115] but there are no more locks to release! > [ 17.172115] > [ 17.172115] other info that might help us debug this: > [ 17.172115] > [ 17.172115] other info that might help us debug this: > [ 17.172115] 4 locks held by kworker/u2:0/5: > [ 17.172115] #0: ("test-ww_mutex"){..}, at: [<41041def>] > process_one_work+0x168/0x33a > [ 17.172115] #1: ((&stress->work)){..}, at: [<41041def>] > process_one_work+0x168/0x33a > [ 17.172115] #2: (ww_class_acquire){..}, at: [<41041e47>] > process_one_work+0x1c0/0x33a > [ 17.172115] #3: (ww_class_mutex){..}, at: [<41057d98>] > stress_inorder_work+0xbf/0x218 > [ 17.172115] > [ 17.172115] stack backtrace: > [ 17.172115] > [ 17.172115] stack backtrace: > [ 17.172115] CPU: 0 PID: 5 Comm: kworker/u2:0 Not tainted > 4.10.0-10265-ge5d56ef #18 > [ 17.172115] Workqueue: test-ww_mutex stress_inorder_work > [ 17.172115] Call Trace: > [ 17.172115] dump_stack+0x16/0x18 > [ 17.172115] print_unlock_imbalance_bug+0xb4/0xc1 > [ 17.172115] ? ww_mutex_unlock+0x66/0x72 > [ 17.172115] lock_release+0x28d/0x2bb > [ 17.172115] ? ww_mutex_unlock+0x66/0x72 > [ 17.172115] __mutex_unlock_slowpath+0x2d/0x1db > [ 17.172115] ? schedule_hrtimeout_range+0xd/0xf > [ 17.172115] mutex_unlock+0xb/0xd > [ 17.172115] ww_mutex_unlock+0x66/0x72 > [ 17.172115] stress_inorder_work+0x10a/0x218 > [ 17.172115] process_one_work+0x1c0/0x33a > [ 17.172115] ? process_one_work+0x168/0x33a > [ 17.172115] worker_thread+0x22f/0x315 > [ 17.172115] kthread+0xed/0xf2 > [ 17.172115] ? process_scheduled_works+0x24/0x24 > [ 17.172115] ? __kthread_create_on_node+0x11f/0x11f > [ 17.172115] ret_from_fork+0x21/0x30 > > Which is an entirely different error.. Lemme look into that. In test-ww_mutex, the stress will use 4096 locks to do the test, and given held_lock::references only have 12 bits, so I think this is the reason? I made a patch to reduce the lock number of stress test, and it seems the problem is resolved. Thoughts? Regards, Boqun >8 From: Boqun Feng Date: Wed, 1 Mar 2017 22:50:46 +0800 Subject: [PATCH] locking/ww_mutex: Adjust the lock number for stress test Because there are only 12 bits in held_lock::references, so we only support 4095 nested lock held in the same time, adjust the lock number for ww_mutex stress test to kill one lockdep splat: [ 17.169056] = [ 17.171014] [ BUG: bad unlock balance detected! ] [ 17.172115] 4.10.0-10265-ge5d56ef #18 Not tainted [ 17.172115] - [ 17.172115] kworker/u2:0/5 is trying to release lock (ww_class_mutex) at: [ 17.172115] [<41575474>] ww_mutex_unlock+0x66/0x72 [ 17.172115] but there are no more locks to release! [ 17.172115] [ 17.172115] other info that might help us debug this: [ 17.172115] [ 17.172115] other info that might help us debug this: [ 17.172115] 4 locks held by kworker/u2:0/5: [ 17.172115] #0: ("test-ww_mutex"){..}, at: [<41041def>] process_one_work+0x168/0x33a [ 17.172115] #1: ((&stress->work)){..}, at: [<41041def>] process_one_work+0x168/0x33a [ 17.172115] #2: (ww_class_acquire){..}, at: [<41041e47>] process_one_work+0x1c0/0x33a [ 17.172115] #3: (ww_class_mutex){..}, at: [<41057d98>] stress_inorder_work+0xbf/0x218 [ 17.172115] [ 17.172115] stack backtrace: [ 17.172115] [ 17.172115] stack backtrace: [ 17.172115] CPU: 0 PID: 5 Comm: kworker/u2:0 Not tainted 4.10.0-10265-ge5d56ef #18 [ 17.172115] Workqueue: test-ww_mutex stress_inorder_work [ 17.172115] Call Trace: [ 17.172115] dump_stack+0x16/0x18 [ 17.172115] print_unlock_imbalance_bug+0xb4/0xc1 [ 17.172115] ? ww_mutex_unlock+0x66/0x72 [ 17.172115] lock_release+0x28d/0x2bb [ 17.172115] ? ww_mutex_unlock+0x66/0x72 [ 17.172115] __mutex_unlock_slowpath+0x2d/0x1db [ 17.172115] ? schedule_hrtimeout_range+0xd/0xf [ 17.172115] mutex_unlock+0xb/0xd [ 17.172115] ww_mutex_unlock+0x66/0x72 [ 17.172115] stress_inorder_work+0x10a/0x218 [ 17.172115] process_one_w
Re: [locking/ww_mutex] 2a0c112828 WARNING: CPU: 0 PID: 18 at kernel/locking/mutex.c:305 __ww_mutex_wakeup_for_backoff
On Mon, Feb 27, 2017 at 11:28:24AM +0100, Peter Zijlstra wrote: > On Mon, Feb 27, 2017 at 01:14:09PM +0800, Fengguang Wu wrote: > > Hello, > > > > This bisect result is not satisfactory, however the bug is very > > reproducible and looks still alive in mainline&linux-next. You may > > try the attached reproduce-* script to debug it. > > OK, let me try that, however, see below. What that reproduces does is: [ 17.169056] = [ 17.171014] [ BUG: bad unlock balance detected! ] [ 17.172115] 4.10.0-10265-ge5d56ef #18 Not tainted [ 17.172115] - [ 17.172115] kworker/u2:0/5 is trying to release lock (ww_class_mutex) at: [ 17.172115] [<41575474>] ww_mutex_unlock+0x66/0x72 [ 17.172115] but there are no more locks to release! [ 17.172115] [ 17.172115] other info that might help us debug this: [ 17.172115] [ 17.172115] other info that might help us debug this: [ 17.172115] 4 locks held by kworker/u2:0/5: [ 17.172115] #0: ("test-ww_mutex"){..}, at: [<41041def>] process_one_work+0x168/0x33a [ 17.172115] #1: ((&stress->work)){..}, at: [<41041def>] process_one_work+0x168/0x33a [ 17.172115] #2: (ww_class_acquire){..}, at: [<41041e47>] process_one_work+0x1c0/0x33a [ 17.172115] #3: (ww_class_mutex){..}, at: [<41057d98>] stress_inorder_work+0xbf/0x218 [ 17.172115] [ 17.172115] stack backtrace: [ 17.172115] [ 17.172115] stack backtrace: [ 17.172115] CPU: 0 PID: 5 Comm: kworker/u2:0 Not tainted 4.10.0-10265-ge5d56ef #18 [ 17.172115] Workqueue: test-ww_mutex stress_inorder_work [ 17.172115] Call Trace: [ 17.172115] dump_stack+0x16/0x18 [ 17.172115] print_unlock_imbalance_bug+0xb4/0xc1 [ 17.172115] ? ww_mutex_unlock+0x66/0x72 [ 17.172115] lock_release+0x28d/0x2bb [ 17.172115] ? ww_mutex_unlock+0x66/0x72 [ 17.172115] __mutex_unlock_slowpath+0x2d/0x1db [ 17.172115] ? schedule_hrtimeout_range+0xd/0xf [ 17.172115] mutex_unlock+0xb/0xd [ 17.172115] ww_mutex_unlock+0x66/0x72 [ 17.172115] stress_inorder_work+0x10a/0x218 [ 17.172115] process_one_work+0x1c0/0x33a [ 17.172115] ? process_one_work+0x168/0x33a [ 17.172115] worker_thread+0x22f/0x315 [ 17.172115] kthread+0xed/0xf2 [ 17.172115] ? process_scheduled_works+0x24/0x24 [ 17.172115] ? __kthread_create_on_node+0x11f/0x11f [ 17.172115] ret_from_fork+0x21/0x30 Which is an entirely different error.. Lemme look into that.
Re: [locking/ww_mutex] 2a0c112828 WARNING: CPU: 0 PID: 18 at kernel/locking/mutex.c:305 __ww_mutex_wakeup_for_backoff
On Mon, Feb 27, 2017 at 01:14:09PM +0800, Fengguang Wu wrote: > Hello, > > This bisect result is not satisfactory, however the bug is very > reproducible and looks still alive in mainline&linux-next. You may > try the attached reproduce-* script to debug it. OK, let me try that, however, see below. > commit 2a0c11282881875dc44f166a20eedf0d866dd0ef > Author: Chris Wilson > AuthorDate: Thu Dec 1 11:47:10 2016 + > Commit: Ingo Molnar > CommitDate: Sat Jan 14 11:37:17 2017 +0100 > > locking/ww_mutex: Add kselftests for ww_mutex stress > > [1.057727] Freeing initrd memory: 3276K > [1.059257] The force parameter has not been set to 1. The Iris poweroff > handler will not be installed. > [1.060266] NatSemi SCx200 Driver > [1.299666] random: fast init done > [1.568992] [ cut here ] > [1.569656] WARNING: CPU: 0 PID: 18 at kernel/locking/mutex.c:305 > __ww_mutex_wakeup_for_backoff+0x31/0x7a > [1.571194] CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted > 4.10.0-rc3-00155-g2a0c112 #1 > [1.572259] Workqueue: events test_abba_work > [1.572833] Call Trace: > [1.573167] dump_stack+0x16/0x18 > [1.573617] __warn+0xa0/0xb7 > [1.574017] ? __ww_mutex_wakeup_for_backoff+0x31/0x7a > [1.574705] warn_slowpath_null+0x11/0x16 > [1.575238] __ww_mutex_wakeup_for_backoff+0x31/0x7a > [1.575897] __ww_mutex_lock+0x2f3/0xb63 > [1.576559] ? wake_up_q+0x25/0x40 > [1.577012] ? __might_sleep+0x6c/0x73 > [1.577518] ww_mutex_lock+0x34/0x3b > [1.577996] ? test_abba_work+0x109/0x182 > [1.578535] test_abba_work+0x109/0x182 That looks like the lockdep_assert_held(&lock->wait_lock) in __ww_mutex_wakeup_for_backoff(). That should be solved the below commit, which is also present in both mainline and -next. --- commit b9c16a0e1f733c97e48798b2a9362c485bb3b731 Author: Peter Zijlstra Date: Tue Jan 17 16:06:09 2017 +0100 locking/mutex: Fix lockdep_assert_held() fail In commit: 659cf9f5824a ("locking/ww_mutex: Optimize ww-mutexes by waking at most one waiter for backoff when acquiring the lock") I replaced a comment with a lockdep_assert_held(). However it turns out we hide that lock from lockdep for hysterical raisins, which results in the assertion always firing. Remove the old debug code as lockdep will easily spot the abuse it was meant to catch, which will make the lock visible to lockdep and make the assertion work as intended. Reported-by: Mike Galbraith Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Nicolai Haehnle Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: 659cf9f5824a ("locking/ww_mutex: Optimize ww-mutexes by waking at most one waiter for backoff when acquiring the lock") Link: http://lkml.kernel.org/r/20170117150609.GB32474@worktop Signed-off-by: Ingo Molnar diff --git a/kernel/locking/mutex-debug.h b/kernel/locking/mutex-debug.h index a459faa48987..4174417d5309 100644 --- a/kernel/locking/mutex-debug.h +++ b/kernel/locking/mutex-debug.h @@ -26,20 +26,3 @@ extern void mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter, extern void debug_mutex_unlock(struct mutex *lock); extern void debug_mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key); - -#define spin_lock_mutex(lock, flags) \ - do {\ - struct mutex *l = container_of(lock, struct mutex, wait_lock); \ - \ - DEBUG_LOCKS_WARN_ON(in_interrupt());\ - local_irq_save(flags); \ - arch_spin_lock(&(lock)->rlock.raw_lock);\ - DEBUG_LOCKS_WARN_ON(l->magic != l); \ - } while (0) - -#define spin_unlock_mutex(lock, flags) \ - do {\ - arch_spin_unlock(&(lock)->rlock.raw_lock); \ - local_irq_restore(flags); \ - preempt_check_resched();\ - } while (0) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 935116723a3d..705e06fe5e6c 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -325,8 +325,6 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx) static __always_inline void ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) { - unsigned long flags; - ww_mutex_lock_acquired(lock, ctx); lock->ctx = ctx; @@ -350,9 +348,9 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) * Uh oh, we raced in fastpath, wake up everyone in this case, * so they can see the new lock->ctx. */ - spin_lock_mute