Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead

2016-10-24 Thread Roman Penyaev
On Fri, Oct 21, 2016 at 5:47 PM, Oleg Nesterov  wrote:
> On 10/20, Andy Lutomirski wrote:
>>
>> > --- a/kernel/sched/core.c
>> > +++ b/kernel/sched/core.c
>> > @@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt)
>> >  * If a worker went to sleep, notify and ask 
>> > workqueue
>> >  * whether it wants to wake up a task to maintain
>> >  * concurrency.
>> > +*
>> > +* Also the following stack is possible:
>> > +*oops_end()
>> > +*do_exit()
>> > +*schedule()
>> > +*
>> > +* If panic_on_oops is not set and oops happens on
>> > +* a workqueue execution path, thread will be 
>> > killed.
>> > +* That is definitly sad, but not to make the 
>> > situation
>> > +* even worse we have to ignore dead tasks in 
>> > order not
>> > +* to step on zeroed out members (e.g. 
>> > t->vfork_done is
>> > +* already NULL on that path, since we were called 
>> > by
>> > +* do_exit()))
>
> And we have more problems like this. Say, if blk_flush_plug_list()
> crashes it will likely crash again and again recursively.

I will send a patch if reproduce it :)

>
>> >  */
>> > -   if (prev->flags & PF_WQ_WORKER) {
>> > +   if (prev->flags & PF_WQ_WORKER &&
>> > +   prev->state != TASK_DEAD) {
>
> I don't think we should change __schedule()... Can't we simply clear
> PF_WQ_WORKER in complete_vfork_done() ? Or add the PF_EXITING checks
> into wq_worker_sleeping() and wq_worker_waking_up().

Yeah, probably handling this corner case in wq_worker_sleeping() func
is much better.

>
> Or perhaps something like the change below.

That's a nice stuff, thanks Oleg. I simply did not know about these
callbacks.

But the huge problem is that after commit 2deb4be28 by Andy Lutomirski
we can't use stack when we are already in do_exit().  And putting this
callback head inside a worker structure is a bloat.  I will resend this
with a simple task state check in a wq_worker_sleeping().

--
Roman


>
> Oleg.
>
> --- x/kernel/workqueue.c
> +++ x/kernel/workqueue.c
> @@ -2157,6 +2157,14 @@ static void process_scheduled_works(stru
> }
>  }
>
> +static void oops_handler(struct callback_head *oops_work)
> +{
> +   if (!(current->flags & PF_WQ_WORKER))
> +   return;
> +
> +   clear PF_WQ_WORKER, probably do more cleanups
> +}
> +
>  /**
>   * worker_thread - the worker thread function
>   * @__worker: self
> @@ -2171,11 +2179,14 @@ static void process_scheduled_works(stru
>   */
>  static int worker_thread(void *__worker)
>  {
> +   struct callback_head oops_work;
> struct worker *worker = __worker;
> struct worker_pool *pool = worker->pool;
>
> /* tell the scheduler that this is a workqueue worker */
> worker->task->flags |= PF_WQ_WORKER;
> +   init_task_work(_work, oops_handler);
> +   task_work_add(current, _work, false);
>  woke_up:
> spin_lock_irq(>lock);
>
>


Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead

2016-10-24 Thread Roman Penyaev
On Fri, Oct 21, 2016 at 5:47 PM, Oleg Nesterov  wrote:
> On 10/20, Andy Lutomirski wrote:
>>
>> > --- a/kernel/sched/core.c
>> > +++ b/kernel/sched/core.c
>> > @@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt)
>> >  * If a worker went to sleep, notify and ask 
>> > workqueue
>> >  * whether it wants to wake up a task to maintain
>> >  * concurrency.
>> > +*
>> > +* Also the following stack is possible:
>> > +*oops_end()
>> > +*do_exit()
>> > +*schedule()
>> > +*
>> > +* If panic_on_oops is not set and oops happens on
>> > +* a workqueue execution path, thread will be 
>> > killed.
>> > +* That is definitly sad, but not to make the 
>> > situation
>> > +* even worse we have to ignore dead tasks in 
>> > order not
>> > +* to step on zeroed out members (e.g. 
>> > t->vfork_done is
>> > +* already NULL on that path, since we were called 
>> > by
>> > +* do_exit()))
>
> And we have more problems like this. Say, if blk_flush_plug_list()
> crashes it will likely crash again and again recursively.

I will send a patch if reproduce it :)

>
>> >  */
>> > -   if (prev->flags & PF_WQ_WORKER) {
>> > +   if (prev->flags & PF_WQ_WORKER &&
>> > +   prev->state != TASK_DEAD) {
>
> I don't think we should change __schedule()... Can't we simply clear
> PF_WQ_WORKER in complete_vfork_done() ? Or add the PF_EXITING checks
> into wq_worker_sleeping() and wq_worker_waking_up().

Yeah, probably handling this corner case in wq_worker_sleeping() func
is much better.

>
> Or perhaps something like the change below.

That's a nice stuff, thanks Oleg. I simply did not know about these
callbacks.

But the huge problem is that after commit 2deb4be28 by Andy Lutomirski
we can't use stack when we are already in do_exit().  And putting this
callback head inside a worker structure is a bloat.  I will resend this
with a simple task state check in a wq_worker_sleeping().

--
Roman


>
> Oleg.
>
> --- x/kernel/workqueue.c
> +++ x/kernel/workqueue.c
> @@ -2157,6 +2157,14 @@ static void process_scheduled_works(stru
> }
>  }
>
> +static void oops_handler(struct callback_head *oops_work)
> +{
> +   if (!(current->flags & PF_WQ_WORKER))
> +   return;
> +
> +   clear PF_WQ_WORKER, probably do more cleanups
> +}
> +
>  /**
>   * worker_thread - the worker thread function
>   * @__worker: self
> @@ -2171,11 +2179,14 @@ static void process_scheduled_works(stru
>   */
>  static int worker_thread(void *__worker)
>  {
> +   struct callback_head oops_work;
> struct worker *worker = __worker;
> struct worker_pool *pool = worker->pool;
>
> /* tell the scheduler that this is a workqueue worker */
> worker->task->flags |= PF_WQ_WORKER;
> +   init_task_work(_work, oops_handler);
> +   task_work_add(current, _work, false);
>  woke_up:
> spin_lock_irq(>lock);
>
>


Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead

2016-10-21 Thread Oleg Nesterov
On 10/20, Andy Lutomirski wrote:
>
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt)
> >  * If a worker went to sleep, notify and ask 
> > workqueue
> >  * whether it wants to wake up a task to maintain
> >  * concurrency.
> > +*
> > +* Also the following stack is possible:
> > +*oops_end()
> > +*do_exit()
> > +*schedule()
> > +*
> > +* If panic_on_oops is not set and oops happens on
> > +* a workqueue execution path, thread will be 
> > killed.
> > +* That is definitly sad, but not to make the 
> > situation
> > +* even worse we have to ignore dead tasks in order 
> > not
> > +* to step on zeroed out members (e.g. 
> > t->vfork_done is
> > +* already NULL on that path, since we were called 
> > by
> > +* do_exit()))

And we have more problems like this. Say, if blk_flush_plug_list()
crashes it will likely crash again and again recursively.

> >  */
> > -   if (prev->flags & PF_WQ_WORKER) {
> > +   if (prev->flags & PF_WQ_WORKER &&
> > +   prev->state != TASK_DEAD) {

I don't think we should change __schedule()... Can't we simply clear
PF_WQ_WORKER in complete_vfork_done() ? Or add the PF_EXITING checks
into wq_worker_sleeping() and wq_worker_waking_up().

Or perhaps something like the change below.

Oleg.

--- x/kernel/workqueue.c
+++ x/kernel/workqueue.c
@@ -2157,6 +2157,14 @@ static void process_scheduled_works(stru
}
 }
 
+static void oops_handler(struct callback_head *oops_work)
+{
+   if (!(current->flags & PF_WQ_WORKER))
+   return;
+
+   clear PF_WQ_WORKER, probably do more cleanups
+}
+
 /**
  * worker_thread - the worker thread function
  * @__worker: self
@@ -2171,11 +2179,14 @@ static void process_scheduled_works(stru
  */
 static int worker_thread(void *__worker)
 {
+   struct callback_head oops_work;
struct worker *worker = __worker;
struct worker_pool *pool = worker->pool;
 
/* tell the scheduler that this is a workqueue worker */
worker->task->flags |= PF_WQ_WORKER;
+   init_task_work(_work, oops_handler);
+   task_work_add(current, _work, false);
 woke_up:
spin_lock_irq(>lock);
 



Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead

2016-10-21 Thread Oleg Nesterov
On 10/20, Andy Lutomirski wrote:
>
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt)
> >  * If a worker went to sleep, notify and ask 
> > workqueue
> >  * whether it wants to wake up a task to maintain
> >  * concurrency.
> > +*
> > +* Also the following stack is possible:
> > +*oops_end()
> > +*do_exit()
> > +*schedule()
> > +*
> > +* If panic_on_oops is not set and oops happens on
> > +* a workqueue execution path, thread will be 
> > killed.
> > +* That is definitly sad, but not to make the 
> > situation
> > +* even worse we have to ignore dead tasks in order 
> > not
> > +* to step on zeroed out members (e.g. 
> > t->vfork_done is
> > +* already NULL on that path, since we were called 
> > by
> > +* do_exit()))

And we have more problems like this. Say, if blk_flush_plug_list()
crashes it will likely crash again and again recursively.

> >  */
> > -   if (prev->flags & PF_WQ_WORKER) {
> > +   if (prev->flags & PF_WQ_WORKER &&
> > +   prev->state != TASK_DEAD) {

I don't think we should change __schedule()... Can't we simply clear
PF_WQ_WORKER in complete_vfork_done() ? Or add the PF_EXITING checks
into wq_worker_sleeping() and wq_worker_waking_up().

Or perhaps something like the change below.

Oleg.

--- x/kernel/workqueue.c
+++ x/kernel/workqueue.c
@@ -2157,6 +2157,14 @@ static void process_scheduled_works(stru
}
 }
 
+static void oops_handler(struct callback_head *oops_work)
+{
+   if (!(current->flags & PF_WQ_WORKER))
+   return;
+
+   clear PF_WQ_WORKER, probably do more cleanups
+}
+
 /**
  * worker_thread - the worker thread function
  * @__worker: self
@@ -2171,11 +2179,14 @@ static void process_scheduled_works(stru
  */
 static int worker_thread(void *__worker)
 {
+   struct callback_head oops_work;
struct worker *worker = __worker;
struct worker_pool *pool = worker->pool;
 
/* tell the scheduler that this is a workqueue worker */
worker->task->flags |= PF_WQ_WORKER;
+   init_task_work(_work, oops_handler);
+   task_work_add(current, _work, false);
 woke_up:
spin_lock_irq(>lock);
 



Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead

2016-10-20 Thread Peter Zijlstra
On Wed, Sep 21, 2016 at 05:43:50PM +0200, Roman Pen wrote:
> If panic_on_oops is not set and oops happens inside workqueue kthread,
> kernel kills this kthread.  Current patch fixes recursive GPF which
> happens in that case with the following stack:


> The root cause is that zeroed task->vfork_done member is accessed from
> wq_worker_sleeping() hook. 

This is the kthread_data() -> to_kthread() thing? Could've done with
spelling out, now you had me searching all over :/


Urgh what a mess..


Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead

2016-10-20 Thread Peter Zijlstra
On Wed, Sep 21, 2016 at 05:43:50PM +0200, Roman Pen wrote:
> If panic_on_oops is not set and oops happens inside workqueue kthread,
> kernel kills this kthread.  Current patch fixes recursive GPF which
> happens in that case with the following stack:


> The root cause is that zeroed task->vfork_done member is accessed from
> wq_worker_sleeping() hook. 

This is the kthread_data() -> to_kthread() thing? Could've done with
spelling out, now you had me searching all over :/


Urgh what a mess..


Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead

2016-10-20 Thread Andy Lutomirski
On Wed, Sep 21, 2016 at 8:43 AM, Roman Pen
 wrote:
> If panic_on_oops is not set and oops happens inside workqueue kthread,
> kernel kills this kthread.  Current patch fixes recursive GPF which
> happens in that case with the following stack:

Oleg, can you take a look at this?

--Andy

>
>   [] dump_stack+0x68/0x93
>   [] ? do_exit+0x7ab/0xc10
>   [] __schedule_bug+0x83/0xe0
>   [] __schedule+0x7ea/0xba0
>   [] ? vprintk_default+0x1f/0x30
>   [] ? printk+0x48/0x50
>   [] schedule+0x40/0x90
>   [] do_exit+0x9ca/0xc10
>   [] ? kmsg_dump+0x11d/0x190
>   [] ? kmsg_dump+0x17/0x190
>   [] oops_end+0x99/0xd0
>   [] no_context+0x185/0x3e0
>   [] __bad_area_nosemaphore+0x83/0x1c0
>   [] ? vprintk_emit+0x25e/0x530
>   [] bad_area_nosemaphore+0x14/0x20
>   [] __do_page_fault+0xac/0x570
>   [] ? console_trylock+0x1e/0xe0
>   [] ? trace_hardirqs_off_thunk+0x1a/0x1c
>   [] do_page_fault+0xc/0x10
>   [] page_fault+0x22/0x30
>   [] ? kthread_data+0x33/0x40
>   [] ? wq_worker_sleeping+0xe/0x80
>   [] __schedule+0x47b/0xba0
>   [] schedule+0x40/0x90
>   [] do_exit+0x7dd/0xc10
>   [] oops_end+0x99/0xd0
>
> The root cause is that zeroed task->vfork_done member is accessed from
> wq_worker_sleeping() hook.  The zeroing out happens on the following
> path:
>
>oops_end()
>do_exit()
>exit_mm()
>mm_release()
>complete_vfork_done()
>
> In order to fix a bug dead tasks must be ignored.
>
> Signed-off-by: Roman Pen 
> Cc: Andy Lutomirski 
> Cc: Josh Poimboeuf 
> Cc: Borislav Petkov 
> Cc: Brian Gerst 
> Cc: Denys Vlasenko 
> Cc: H. Peter Anvin 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Tejun Heo 
> Cc: linux-kernel@vger.kernel.org
> ---
>  kernel/sched/core.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2c303e7..50772e5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt)
>  * If a worker went to sleep, notify and ask workqueue
>  * whether it wants to wake up a task to maintain
>  * concurrency.
> +*
> +* Also the following stack is possible:
> +*oops_end()
> +*do_exit()
> +*schedule()
> +*
> +* If panic_on_oops is not set and oops happens on
> +* a workqueue execution path, thread will be killed.
> +* That is definitly sad, but not to make the 
> situation
> +* even worse we have to ignore dead tasks in order 
> not
> +* to step on zeroed out members (e.g. t->vfork_done 
> is
> +* already NULL on that path, since we were called by
> +* do_exit()))
>  */
> -   if (prev->flags & PF_WQ_WORKER) {
> +   if (prev->flags & PF_WQ_WORKER &&
> +   prev->state != TASK_DEAD) {
> struct task_struct *to_wakeup;
>
> to_wakeup = wq_worker_sleeping(prev);
> --
> 2.9.3
>



-- 
Andy Lutomirski
AMA Capital Management, LLC


Re: [PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead

2016-10-20 Thread Andy Lutomirski
On Wed, Sep 21, 2016 at 8:43 AM, Roman Pen
 wrote:
> If panic_on_oops is not set and oops happens inside workqueue kthread,
> kernel kills this kthread.  Current patch fixes recursive GPF which
> happens in that case with the following stack:

Oleg, can you take a look at this?

--Andy

>
>   [] dump_stack+0x68/0x93
>   [] ? do_exit+0x7ab/0xc10
>   [] __schedule_bug+0x83/0xe0
>   [] __schedule+0x7ea/0xba0
>   [] ? vprintk_default+0x1f/0x30
>   [] ? printk+0x48/0x50
>   [] schedule+0x40/0x90
>   [] do_exit+0x9ca/0xc10
>   [] ? kmsg_dump+0x11d/0x190
>   [] ? kmsg_dump+0x17/0x190
>   [] oops_end+0x99/0xd0
>   [] no_context+0x185/0x3e0
>   [] __bad_area_nosemaphore+0x83/0x1c0
>   [] ? vprintk_emit+0x25e/0x530
>   [] bad_area_nosemaphore+0x14/0x20
>   [] __do_page_fault+0xac/0x570
>   [] ? console_trylock+0x1e/0xe0
>   [] ? trace_hardirqs_off_thunk+0x1a/0x1c
>   [] do_page_fault+0xc/0x10
>   [] page_fault+0x22/0x30
>   [] ? kthread_data+0x33/0x40
>   [] ? wq_worker_sleeping+0xe/0x80
>   [] __schedule+0x47b/0xba0
>   [] schedule+0x40/0x90
>   [] do_exit+0x7dd/0xc10
>   [] oops_end+0x99/0xd0
>
> The root cause is that zeroed task->vfork_done member is accessed from
> wq_worker_sleeping() hook.  The zeroing out happens on the following
> path:
>
>oops_end()
>do_exit()
>exit_mm()
>mm_release()
>complete_vfork_done()
>
> In order to fix a bug dead tasks must be ignored.
>
> Signed-off-by: Roman Pen 
> Cc: Andy Lutomirski 
> Cc: Josh Poimboeuf 
> Cc: Borislav Petkov 
> Cc: Brian Gerst 
> Cc: Denys Vlasenko 
> Cc: H. Peter Anvin 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Tejun Heo 
> Cc: linux-kernel@vger.kernel.org
> ---
>  kernel/sched/core.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2c303e7..50772e5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt)
>  * If a worker went to sleep, notify and ask workqueue
>  * whether it wants to wake up a task to maintain
>  * concurrency.
> +*
> +* Also the following stack is possible:
> +*oops_end()
> +*do_exit()
> +*schedule()
> +*
> +* If panic_on_oops is not set and oops happens on
> +* a workqueue execution path, thread will be killed.
> +* That is definitly sad, but not to make the 
> situation
> +* even worse we have to ignore dead tasks in order 
> not
> +* to step on zeroed out members (e.g. t->vfork_done 
> is
> +* already NULL on that path, since we were called by
> +* do_exit()))
>  */
> -   if (prev->flags & PF_WQ_WORKER) {
> +   if (prev->flags & PF_WQ_WORKER &&
> +   prev->state != TASK_DEAD) {
> struct task_struct *to_wakeup;
>
> to_wakeup = wq_worker_sleeping(prev);
> --
> 2.9.3
>



-- 
Andy Lutomirski
AMA Capital Management, LLC


[PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead

2016-09-21 Thread Roman Pen
If panic_on_oops is not set and oops happens inside workqueue kthread,
kernel kills this kthread.  Current patch fixes recursive GPF which
happens in that case with the following stack:

  [] dump_stack+0x68/0x93
  [] ? do_exit+0x7ab/0xc10
  [] __schedule_bug+0x83/0xe0
  [] __schedule+0x7ea/0xba0
  [] ? vprintk_default+0x1f/0x30
  [] ? printk+0x48/0x50
  [] schedule+0x40/0x90
  [] do_exit+0x9ca/0xc10
  [] ? kmsg_dump+0x11d/0x190
  [] ? kmsg_dump+0x17/0x190
  [] oops_end+0x99/0xd0
  [] no_context+0x185/0x3e0
  [] __bad_area_nosemaphore+0x83/0x1c0
  [] ? vprintk_emit+0x25e/0x530
  [] bad_area_nosemaphore+0x14/0x20
  [] __do_page_fault+0xac/0x570
  [] ? console_trylock+0x1e/0xe0
  [] ? trace_hardirqs_off_thunk+0x1a/0x1c
  [] do_page_fault+0xc/0x10
  [] page_fault+0x22/0x30
  [] ? kthread_data+0x33/0x40
  [] ? wq_worker_sleeping+0xe/0x80
  [] __schedule+0x47b/0xba0
  [] schedule+0x40/0x90
  [] do_exit+0x7dd/0xc10
  [] oops_end+0x99/0xd0

The root cause is that zeroed task->vfork_done member is accessed from
wq_worker_sleeping() hook.  The zeroing out happens on the following
path:

   oops_end()
   do_exit()
   exit_mm()
   mm_release()
   complete_vfork_done()

In order to fix a bug dead tasks must be ignored.

Signed-off-by: Roman Pen 
Cc: Andy Lutomirski 
Cc: Josh Poimboeuf 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Tejun Heo 
Cc: linux-kernel@vger.kernel.org
---
 kernel/sched/core.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2c303e7..50772e5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt)
 * If a worker went to sleep, notify and ask workqueue
 * whether it wants to wake up a task to maintain
 * concurrency.
+*
+* Also the following stack is possible:
+*oops_end()
+*do_exit()
+*schedule()
+*
+* If panic_on_oops is not set and oops happens on
+* a workqueue execution path, thread will be killed.
+* That is definitly sad, but not to make the situation
+* even worse we have to ignore dead tasks in order not
+* to step on zeroed out members (e.g. t->vfork_done is
+* already NULL on that path, since we were called by
+* do_exit()))
 */
-   if (prev->flags & PF_WQ_WORKER) {
+   if (prev->flags & PF_WQ_WORKER &&
+   prev->state != TASK_DEAD) {
struct task_struct *to_wakeup;
 
to_wakeup = wq_worker_sleeping(prev);
-- 
2.9.3



[PATCH 2/2] sched: do not call workqueue sleep hook if task is already dead

2016-09-21 Thread Roman Pen
If panic_on_oops is not set and oops happens inside workqueue kthread,
kernel kills this kthread.  Current patch fixes recursive GPF which
happens in that case with the following stack:

  [] dump_stack+0x68/0x93
  [] ? do_exit+0x7ab/0xc10
  [] __schedule_bug+0x83/0xe0
  [] __schedule+0x7ea/0xba0
  [] ? vprintk_default+0x1f/0x30
  [] ? printk+0x48/0x50
  [] schedule+0x40/0x90
  [] do_exit+0x9ca/0xc10
  [] ? kmsg_dump+0x11d/0x190
  [] ? kmsg_dump+0x17/0x190
  [] oops_end+0x99/0xd0
  [] no_context+0x185/0x3e0
  [] __bad_area_nosemaphore+0x83/0x1c0
  [] ? vprintk_emit+0x25e/0x530
  [] bad_area_nosemaphore+0x14/0x20
  [] __do_page_fault+0xac/0x570
  [] ? console_trylock+0x1e/0xe0
  [] ? trace_hardirqs_off_thunk+0x1a/0x1c
  [] do_page_fault+0xc/0x10
  [] page_fault+0x22/0x30
  [] ? kthread_data+0x33/0x40
  [] ? wq_worker_sleeping+0xe/0x80
  [] __schedule+0x47b/0xba0
  [] schedule+0x40/0x90
  [] do_exit+0x7dd/0xc10
  [] oops_end+0x99/0xd0

The root cause is that zeroed task->vfork_done member is accessed from
wq_worker_sleeping() hook.  The zeroing out happens on the following
path:

   oops_end()
   do_exit()
   exit_mm()
   mm_release()
   complete_vfork_done()

In order to fix a bug dead tasks must be ignored.

Signed-off-by: Roman Pen 
Cc: Andy Lutomirski 
Cc: Josh Poimboeuf 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Tejun Heo 
Cc: linux-kernel@vger.kernel.org
---
 kernel/sched/core.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2c303e7..50772e5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3380,8 +3380,22 @@ static void __sched notrace __schedule(bool preempt)
 * If a worker went to sleep, notify and ask workqueue
 * whether it wants to wake up a task to maintain
 * concurrency.
+*
+* Also the following stack is possible:
+*oops_end()
+*do_exit()
+*schedule()
+*
+* If panic_on_oops is not set and oops happens on
+* a workqueue execution path, thread will be killed.
+* That is definitly sad, but not to make the situation
+* even worse we have to ignore dead tasks in order not
+* to step on zeroed out members (e.g. t->vfork_done is
+* already NULL on that path, since we were called by
+* do_exit()))
 */
-   if (prev->flags & PF_WQ_WORKER) {
+   if (prev->flags & PF_WQ_WORKER &&
+   prev->state != TASK_DEAD) {
struct task_struct *to_wakeup;
 
to_wakeup = wq_worker_sleeping(prev);
-- 
2.9.3