Re: [PATCH for-3.20-fixes] workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
Hello, Tomeu. On Tue, Mar 03, 2015 at 11:00:43AM +0100, Tomeu Vizoso wrote: ... > [7.421803] PC is at wake_bit_function+0x18/0x6c > [7.428168] LR is at __wake_up_common+0x5c/0x90 ... > [7.673183] [] (wake_bit_function) from [] > (__wake_up_common+0x5c/0x90) > [7.683300] [] (__wake_up_common) from [] > (__wake_up+0x48/0x5c) > [7.692744] [] (__wake_up) from [] > (__cancel_work_timer+0xe8/0x1ac) > [7.702533] [] (__cancel_work_timer) from [] > (cancel_work_sync+0x1c/0x20) > [7.712854] [] (cancel_work_sync) from [] > (css_free_work_fn+0x174/0x2ec) > [7.723099] [] (css_free_work_fn) from [] > (process_one_work+0x15c/0x3dc) > [7.79] [] (process_one_work) from [] > (worker_thread+0x54/0x4e8) > [7.743224] [] (worker_thread) from [] > (kthread+0xec/0x104) > [7.752339] [] (kthread) from [] > (ret_from_fork+0x14/0x34) > [7.761366] Code: e24cb004 e52de004 e8bd4000 e510400c (e5935000) Hah, weird. How is your machine ending up in wake_bit_function() from there? Can you please apply the following patch and report the dmesg after crash? Thanks. --- kernel/workqueue.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2755,8 +2755,12 @@ static bool __cancel_work_timer(struct w if (unlikely(ret == -ENOENT)) { DEFINE_WAIT(wait); prepare_to_wait(waitq, &wait, TASK_UNINTERRUPTIBLE); - if (work_is_canceling(work)) + WARN_ON_ONCE(wait.func != autoremove_wake_function); + if (work_is_canceling(work)) { + printk("XXX __cancel_work_timer: %d sleeping w/ %pf, wait=%p\n", + task_pid_nr(current), wait.func, &wait); schedule(); + } finish_wait(waitq, &wait); } } while (unlikely(ret < 0)); @@ -2774,8 +2778,16 @@ static bool __cancel_work_timer(struct w * visible there. */ smp_mb(); - if (waitqueue_active(waitq)) + if (waitqueue_active(waitq)) { + wait_queue_t *cur; + + spin_lock_irq(&waitq->lock); + list_for_each_entry(cur, &waitq->task_list, task_list) + printk("XXX __cancel_work_timer: waking up %pf, wait=%p\n", + cur->func, cur); + spin_unlock_irq(&waitq->lock); wake_up(waitq); + } return ret; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for-3.20-fixes] workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
On 2 March 2015 at 17:21, Tejun Heo wrote: > On Mon, Mar 02, 2015 at 01:26:15PM +0100, Jesper Nilsson wrote: >> On Mon, Feb 09, 2015 at 05:15:27PM +0100, Tejun Heo wrote: >> > Hello, >> >> Hi! >> >> > This patch removes the possible hang by updating __cancel_work_timer() >> > to explicitly wait for clearing of CANCELING rather than invoking >> > flush_work() after try_to_grab_pending() fails with -ENOENT. The >> > explicit wait uses the matching bit waitqueue for the CANCELING bit. >> > >> > Link: http://lkml.kernel.org/g/20150206171156.ga8...@axis.com >> > >> > Signed-off-by: Tejun Heo >> > Reported-by: Rabin Vincent >> > Cc: sta...@vger.kernel.org >> >> What's the status on this patch, it's not in 4.0-rc1 at least? >> Is it queued for the 3.18 stable branch? > > Sorry about the delay. Applied to wq/for-4.0-fixes. Will push out in > a week or so. Hello, I'm getting this during almost every boot this morning, after rebasing on today's linux-next. Reverting this patch makes the issue go away. This has been tested on a Tegra 124-based Acer Chromebook 13, running a Debian derivative (I mention this because I see that in some test farms the boot succeeded on similar hw, but they probably have a simpler userspace, eg !systemd). [7.358239] Unable to handle kernel NULL pointer dereference at virtual address [7.368225] pgd = c0204000 [7.372693] [] *pgd= [7.378031] Internal error: Oops: 17 [#1] SMP ARM [7.384486] Modules linked in: ipv6 [7.389738] CPU: 1 PID: 110 Comm: kworker/1:2 Not tainted 4.0.0-rc1-next-20150303ccu #568 [7.399687] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) [7.407736] Workqueue: cgroup_destroy css_free_work_fn [7.414645] task: ecfe8e40 ti: eb9e task.ti: eb9e [7.421803] PC is at wake_bit_function+0x18/0x6c [7.428168] LR is at __wake_up_common+0x5c/0x90 [7.434433] pc : []lr : []psr: 200f0093 [7.434433] sp : eb9e1df0 ip : eb9e1e08 fp : eb9e1e04 [7.449379] r10: 0001 r9 : 0003 r8 : [7.456331] r7 : r6 : ee837a28 r5 : 0001 r4 : eeda8ee0 [7.464580] r3 : r2 : r1 : 0003 r0 : ebb19df4 [7.472825] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel [7.481952] Control: 10c5387d Table: abb9406a DAC: 0015 [7.489427] Process kworker/1:2 (pid: 110, stack limit = 0xeb9e0220) [7.497524] Stack: (0xeb9e1df0 to 0xeb9e2000) [7.503616] 1de0: ee837a1c 0001 eb9e1e34 eb9e1e08 [7.513549] 1e00: c028ed3c c028f540 ee837a24 800f0013 0001 0003 [7.523490] 1e20: eb9e1e64 eb9e1e38 c028ef88 c028ecec c026d274 [7.533438] 1e40: eb9e1e74 0011 eb932fb4 ee837a24 c028f4f0 eb9e1eac eb9e1e68 [7.543390] 1e60: c026fdfc c028ef4c 600f0013 eb9e1e88 eb9e1e88 eb9e1e74 eb9e1e74 0006 [7.553357] 1e80: ebaf2a80 eb932f88 eb932f00 eb932f90 c11f5638 ee7f7005 [7.563329] 1ea0: eb9e1ebc eb9e1eb0 c026fedc c026fd20 eb9e1ee4 eb9e1ec0 c02ce3bc c026fecc [7.573310] 1ec0: eb932f50 ecf81080 c11b0338 ee7f33c0 ee7f7000 eb9e1f24 eb9e1ee8 [7.583296] 1ee0: c026f4d0 c02ce254 ee7f33c0 ee7f33d4 eb9e ecf81080 ee7f33c0 [7.593289] 1f00: ecf81098 ee7f33d4 eb9e 0008 ecf81080 ee7f33c0 eb9e1f5c eb9e1f28 [7.603295] 1f20: c026ff6c c026f380 c026ff18 c10ae100 eb9c0180 ecf81080 [7.613305] 1f40: c026ff18 eb9e1fac eb9e1f60 c02749d8 c026ff24 [7.623321] 1f60: ecf81080 eb9e1f78 eb9e1f78 [7.67] 1f80: eb9e1f88 eb9e1f88 eb9c0180 c02748ec [7.643330] 1fa0: eb9e1fb0 c0210aa0 c02748f8 [7.653299] 1fc0: [7.663248] 1fe0: 0013 [7.673183] [] (wake_bit_function) from [] (__wake_up_common+0x5c/0x90) [7.683300] [] (__wake_up_common) from [] (__wake_up+0x48/0x5c) [7.692744] [] (__wake_up) from [] (__cancel_work_timer+0xe8/0x1ac) [7.702533] [] (__cancel_work_timer) from [] (cancel_work_sync+0x1c/0x20) [7.712854] [] (cancel_work_sync) from [] (css_free_work_fn+0x174/0x2ec) [7.723099] [] (css_free_work_fn) from [] (process_one_work+0x15c/0x3dc) [7.79] [] (process_one_work) from [] (worker_thread+0x54/0x4e8) [7.743224] [] (worker_thread) from [] (kthread+0xec/0x104) [7.752339] [] (kthread) from [] (ret_from_fork+0x14/0x34) [7.761366] Code: e24cb004 e52de004 e8bd4000 e510400c (e5935000) [7.769273] ---[ end trace f25fc65c3d66034c ]--- [7.778675] Unable to handle kernel paging request at virtual address ffec [7.787735] pgd = c0204000 [7.792274] [ffec] *pgd=af7fd821, *pte=, *ppte=000
Re: [PATCH for-3.20-fixes] workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
On Mon, Mar 02, 2015 at 11:21:44AM -0500, Tejun Heo wrote: > On Mon, Mar 02, 2015 at 01:26:15PM +0100, Jesper Nilsson wrote: > > On Mon, Feb 09, 2015 at 05:15:27PM +0100, Tejun Heo wrote: > > > Hello, > > > > Hi! > > > > > This patch removes the possible hang by updating __cancel_work_timer() > > > to explicitly wait for clearing of CANCELING rather than invoking > > > flush_work() after try_to_grab_pending() fails with -ENOENT. The > > > explicit wait uses the matching bit waitqueue for the CANCELING bit. > > > > > > Link: http://lkml.kernel.org/g/20150206171156.ga8...@axis.com > > > > > > Signed-off-by: Tejun Heo > > > Reported-by: Rabin Vincent > > > Cc: sta...@vger.kernel.org > > > > What's the status on this patch, it's not in 4.0-rc1 at least? > > Is it queued for the 3.18 stable branch? > > Sorry about the delay. Applied to wq/for-4.0-fixes. Will push out in > a week or so. Thanks, that works great. Just wanted to know. :-) > Thanks. > > -- > tejun /^JN - Jesper Nilsson -- Jesper Nilsson -- jesper.nils...@axis.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for-3.20-fixes] workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
On Mon, Mar 02, 2015 at 01:26:15PM +0100, Jesper Nilsson wrote: > On Mon, Feb 09, 2015 at 05:15:27PM +0100, Tejun Heo wrote: > > Hello, > > Hi! > > > This patch removes the possible hang by updating __cancel_work_timer() > > to explicitly wait for clearing of CANCELING rather than invoking > > flush_work() after try_to_grab_pending() fails with -ENOENT. The > > explicit wait uses the matching bit waitqueue for the CANCELING bit. > > > > Link: http://lkml.kernel.org/g/20150206171156.ga8...@axis.com > > > > Signed-off-by: Tejun Heo > > Reported-by: Rabin Vincent > > Cc: sta...@vger.kernel.org > > What's the status on this patch, it's not in 4.0-rc1 at least? > Is it queued for the 3.18 stable branch? Sorry about the delay. Applied to wq/for-4.0-fixes. Will push out in a week or so. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for-3.20-fixes] workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
On Mon, Feb 09, 2015 at 05:15:27PM +0100, Tejun Heo wrote: > Hello, Hi! > This patch removes the possible hang by updating __cancel_work_timer() > to explicitly wait for clearing of CANCELING rather than invoking > flush_work() after try_to_grab_pending() fails with -ENOENT. The > explicit wait uses the matching bit waitqueue for the CANCELING bit. > > Link: http://lkml.kernel.org/g/20150206171156.ga8...@axis.com > > Signed-off-by: Tejun Heo > Reported-by: Rabin Vincent > Cc: sta...@vger.kernel.org What's the status on this patch, it's not in 4.0-rc1 at least? Is it queued for the 3.18 stable branch? Best regards, /^JN - Jesper Nilsson -- Jesper Nilsson -- jesper.nils...@axis.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for-3.20-fixes] workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
On Mon, Feb 09, 2015 at 05:15:27PM +0100, Tejun Heo wrote: > This patch removes the possible hang by updating __cancel_work_timer() > to explicitly wait for clearing of CANCELING rather than invoking > flush_work() after try_to_grab_pending() fails with -ENOENT. The > explicit wait uses the matching bit waitqueue for the CANCELING bit. > > Link: http://lkml.kernel.org/g/20150206171156.ga8...@axis.com > > Signed-off-by: Tejun Heo Fixes my synthetic test: Tested-by: Rabin Vincent Thanks. /Rabin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for-3.20-fixes] workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
On Mon, Feb 09, 2015 at 05:29:55PM +0100, Jesper Nilsson wrote: > On Mon, Feb 09, 2015 at 05:15:27PM +0100, Tejun Heo wrote: > > Hello, > > Hi, > > > Can you please verify that the following patch fixes the issue? > > Rabin will have to report if it fixes it for his synthetic case, > but I'll try it in my "real-world" jffs2 sync problem, and report > after a couple of hours. I let the test run for the night, and got no failures. Before I got at most an hour of tests before lockup. Tested-by: Jesper Nilsson /^JN - Jesper Nilsson -- Jesper Nilsson -- jesper.nils...@axis.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH for-3.20-fixes] workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
On Mon, Feb 09, 2015 at 05:15:27PM +0100, Tejun Heo wrote: > Hello, Hi, > Can you please verify that the following patch fixes the issue? Rabin will have to report if it fixes it for his synthetic case, but I'll try it in my "real-world" jffs2 sync problem, and report after a couple of hours. > Thanks. /Jesper > 8< > cancel[_delayed]_work_sync() are implemented using > __cancel_work_timer() which grabs the PENDING bit using > try_to_grab_pending() and then flushes the work item with PENDING set > to prevent the on-going execution of the work item from requeueing > itself. > > try_to_grab_pending() can always grab PENDING bit without blocking > except when someone else is doing the above flushing during > cancelation. In that case, try_to_grab_pending() returns -ENOENT. In > this case, __cancel_work_timer() currently invokes flush_work(). The > assumption is that the completion of the work item is what the other > canceling task would be waiting for too and thus waiting for the same > condition and retrying should allow forward progress without excessive > busy looping > > Unfortunately, this doesn't work if preemption is disabled or the > latter task has real time priority. Let's say task A just got woken > up from flush_work() by the completion of the target work item. If, > before task A starts executing, task B gets scheduled and invokes > __cancel_work_timer() on the same work item, its try_to_grab_pending() > will return -ENOENT as the work item is still being canceled by task A > and flush_work() will also immediately return false as the work item > is no longer executing. This puts task B in a busy loop possibly > preventing task A from executing and clearing the canceling state on > the work item leading to a hang. > > task Atask B worker > > executing work > __cancel_work_timer() > try_to_grab_pending() > set work CANCELING > flush_work() > block for work completion > completion, wakes up A > __cancel_work_timer() > while (forever) { > try_to_grab_pending() > -ENOENT as work is being canceled > flush_work() > false as work is no longer executing > } > > This patch removes the possible hang by updating __cancel_work_timer() > to explicitly wait for clearing of CANCELING rather than invoking > flush_work() after try_to_grab_pending() fails with -ENOENT. The > explicit wait uses the matching bit waitqueue for the CANCELING bit. > > Link: http://lkml.kernel.org/g/20150206171156.ga8...@axis.com > > Signed-off-by: Tejun Heo > Reported-by: Rabin Vincent > Cc: sta...@vger.kernel.org > --- > include/linux/workqueue.h |3 ++- > kernel/workqueue.c| 36 > 2 files changed, 34 insertions(+), 5 deletions(-) > > --- a/include/linux/workqueue.h > +++ b/include/linux/workqueue.h > @@ -70,7 +70,8 @@ enum { > /* data contains off-queue information when !WORK_STRUCT_PWQ */ > WORK_OFFQ_FLAG_BASE = WORK_STRUCT_COLOR_SHIFT, > > - WORK_OFFQ_CANCELING = (1 << WORK_OFFQ_FLAG_BASE), > + __WORK_OFFQ_CANCELING = WORK_OFFQ_FLAG_BASE, > + WORK_OFFQ_CANCELING = (1 << __WORK_OFFQ_CANCELING), > > /* >* When a work item is off queue, its high bits point to the last > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -2730,17 +2730,35 @@ EXPORT_SYMBOL_GPL(flush_work); > > static bool __cancel_work_timer(struct work_struct *work, bool is_dwork) > { > + wait_queue_head_t *waitq = bit_waitqueue(&work->data, > + __WORK_OFFQ_CANCELING); > unsigned long flags; > int ret; > > do { > ret = try_to_grab_pending(work, is_dwork, &flags); > /* > - * If someone else is canceling, wait for the same event it > - * would be waiting for before retrying. > + * If someone else is already canceling, wait for it to > + * finish. flush_work() doesn't work for PREEMPT_NONE > + * because we may get scheduled between @work's completion > + * and the other canceling task resuming and clearing > + * CANCELING - flush_work() will return false immediately > + * as @work is no longer busy, try_to_grab_pending() will > + * return -ENOENT as @work is still being canceled and the > + * other canceling task won't be able to clear CANCELING as > + * we're hogging the CPU. > + * > + * Explicitly wait for completion using a bit waitqueue. > + * We can't use wait_on_bit() as the CANCELING bit may get > + * recycled to point
[PATCH for-3.20-fixes] workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
Hello, Can you please verify that the following patch fixes the issue? Thanks. 8< cancel[_delayed]_work_sync() are implemented using __cancel_work_timer() which grabs the PENDING bit using try_to_grab_pending() and then flushes the work item with PENDING set to prevent the on-going execution of the work item from requeueing itself. try_to_grab_pending() can always grab PENDING bit without blocking except when someone else is doing the above flushing during cancelation. In that case, try_to_grab_pending() returns -ENOENT. In this case, __cancel_work_timer() currently invokes flush_work(). The assumption is that the completion of the work item is what the other canceling task would be waiting for too and thus waiting for the same condition and retrying should allow forward progress without excessive busy looping Unfortunately, this doesn't work if preemption is disabled or the latter task has real time priority. Let's say task A just got woken up from flush_work() by the completion of the target work item. If, before task A starts executing, task B gets scheduled and invokes __cancel_work_timer() on the same work item, its try_to_grab_pending() will return -ENOENT as the work item is still being canceled by task A and flush_work() will also immediately return false as the work item is no longer executing. This puts task B in a busy loop possibly preventing task A from executing and clearing the canceling state on the work item leading to a hang. task A task B worker executing work __cancel_work_timer() try_to_grab_pending() set work CANCELING flush_work() block for work completion completion, wakes up A __cancel_work_timer() while (forever) { try_to_grab_pending() -ENOENT as work is being canceled flush_work() false as work is no longer executing } This patch removes the possible hang by updating __cancel_work_timer() to explicitly wait for clearing of CANCELING rather than invoking flush_work() after try_to_grab_pending() fails with -ENOENT. The explicit wait uses the matching bit waitqueue for the CANCELING bit. Link: http://lkml.kernel.org/g/20150206171156.ga8...@axis.com Signed-off-by: Tejun Heo Reported-by: Rabin Vincent Cc: sta...@vger.kernel.org --- include/linux/workqueue.h |3 ++- kernel/workqueue.c| 36 2 files changed, 34 insertions(+), 5 deletions(-) --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -70,7 +70,8 @@ enum { /* data contains off-queue information when !WORK_STRUCT_PWQ */ WORK_OFFQ_FLAG_BASE = WORK_STRUCT_COLOR_SHIFT, - WORK_OFFQ_CANCELING = (1 << WORK_OFFQ_FLAG_BASE), + __WORK_OFFQ_CANCELING = WORK_OFFQ_FLAG_BASE, + WORK_OFFQ_CANCELING = (1 << __WORK_OFFQ_CANCELING), /* * When a work item is off queue, its high bits point to the last --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2730,17 +2730,35 @@ EXPORT_SYMBOL_GPL(flush_work); static bool __cancel_work_timer(struct work_struct *work, bool is_dwork) { + wait_queue_head_t *waitq = bit_waitqueue(&work->data, +__WORK_OFFQ_CANCELING); unsigned long flags; int ret; do { ret = try_to_grab_pending(work, is_dwork, &flags); /* -* If someone else is canceling, wait for the same event it -* would be waiting for before retrying. +* If someone else is already canceling, wait for it to +* finish. flush_work() doesn't work for PREEMPT_NONE +* because we may get scheduled between @work's completion +* and the other canceling task resuming and clearing +* CANCELING - flush_work() will return false immediately +* as @work is no longer busy, try_to_grab_pending() will +* return -ENOENT as @work is still being canceled and the +* other canceling task won't be able to clear CANCELING as +* we're hogging the CPU. +* +* Explicitly wait for completion using a bit waitqueue. +* We can't use wait_on_bit() as the CANCELING bit may get +* recycled to point to pwq if @work gets re-queued. */ - if (unlikely(ret == -ENOENT)) - flush_work(work); + if (unlikely(ret == -ENOENT)) { + DEFINE_WAIT(wait); + prepare_to_wait(waitq, &wait, TASK_UNINTERRUPTIBLE); + if (work_is_canceling(work)) +