Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-29 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 11:58 AM, Oleg Nesterov  wrote:
> On 06/27, Oleg Nesterov wrote:
>>
>> On 06/27, Andy Lutomirski wrote:
>> >
>> > Want to send a patch?  I could do it, but you understand this code
>> > much better than I do.
>>
>> Well, I'll try to do this tomorrow unless you do it.
>
> I have cloned luto/linux.git to see if kthread_stop() can pin ->stack
> somehow, but it seems this is not possible, finish_task_switch() does
> free_thread_stack() unconditionally.
>
> Then how (say) proc_pid_stack() can work? If it hits the task which is
> alreay dead we are (probably) fine, valid_stack_ptr() should fail iiuc.
>

I changed save_stack_trace_tsk() to use try_get_task_stack().  I think
that's sufficient to fix this.

--Andy


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-29 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 11:58 AM, Oleg Nesterov  wrote:
> On 06/27, Oleg Nesterov wrote:
>>
>> On 06/27, Andy Lutomirski wrote:
>> >
>> > Want to send a patch?  I could do it, but you understand this code
>> > much better than I do.
>>
>> Well, I'll try to do this tomorrow unless you do it.
>
> I have cloned luto/linux.git to see if kthread_stop() can pin ->stack
> somehow, but it seems this is not possible, finish_task_switch() does
> free_thread_stack() unconditionally.
>
> Then how (say) proc_pid_stack() can work? If it hits the task which is
> alreay dead we are (probably) fine, valid_stack_ptr() should fail iiuc.
>

I changed save_stack_trace_tsk() to use try_get_task_stack().  I think
that's sufficient to fix this.

--Andy


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-29 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 3:59 PM, Oleg Nesterov  wrote:
> On 06/28, Andy Lutomirski wrote:
>>
>> On Tue, Jun 28, 2016 at 1:12 PM, Oleg Nesterov  wrote:
>> >
>> > So please forget unless you see another reason for this change.
>> >
>>
>> But I might need to that anyway for procfs to read the the stack,
>> right?  Do you see another way to handle that case?
>
> Well, we could use probe_kernel_text() and recheck tsk->stack != NULL
> after this.
>
> But,
>
>> I'm thinking of adding:
>>
>> void *try_get_task_stack(struct task_struct *tsk);
>> void put_task_stack(struct task_struct *tsk);
>
> Yes, agreed, this looks better.
>
> Oleg.
>

I pushed that change to my tree (seems to work well enough to boot
without warnings as long as I don't unmount XFS, but not particularly
well tested).  Want to refresh your patch on top?

I'll probably have to split the patch to introduce no-op
try_get_task_stack / put_task_stack first so I can avoid breaking
bisection, but the interface shouldn't change unless something's wrong
with it.

--Andy


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-29 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 3:59 PM, Oleg Nesterov  wrote:
> On 06/28, Andy Lutomirski wrote:
>>
>> On Tue, Jun 28, 2016 at 1:12 PM, Oleg Nesterov  wrote:
>> >
>> > So please forget unless you see another reason for this change.
>> >
>>
>> But I might need to that anyway for procfs to read the the stack,
>> right?  Do you see another way to handle that case?
>
> Well, we could use probe_kernel_text() and recheck tsk->stack != NULL
> after this.
>
> But,
>
>> I'm thinking of adding:
>>
>> void *try_get_task_stack(struct task_struct *tsk);
>> void put_task_stack(struct task_struct *tsk);
>
> Yes, agreed, this looks better.
>
> Oleg.
>

I pushed that change to my tree (seems to work well enough to boot
without warnings as long as I don't unmount XFS, but not particularly
well tested).  Want to refresh your patch on top?

I'll probably have to split the patch to introduce no-op
try_get_task_stack / put_task_stack first so I can avoid breaking
bisection, but the interface shouldn't change unless something's wrong
with it.

--Andy


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Oleg Nesterov
On 06/28, Andy Lutomirski wrote:
>
> On Tue, Jun 28, 2016 at 1:12 PM, Oleg Nesterov  wrote:
> >
> > So please forget unless you see another reason for this change.
> >
>
> But I might need to that anyway for procfs to read the the stack,
> right?  Do you see another way to handle that case?

Well, we could use probe_kernel_text() and recheck tsk->stack != NULL
after this.

But,

> I'm thinking of adding:
>
> void *try_get_task_stack(struct task_struct *tsk);
> void put_task_stack(struct task_struct *tsk);

Yes, agreed, this looks better.

Oleg.



Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Oleg Nesterov
On 06/28, Andy Lutomirski wrote:
>
> On Tue, Jun 28, 2016 at 1:12 PM, Oleg Nesterov  wrote:
> >
> > So please forget unless you see another reason for this change.
> >
>
> But I might need to that anyway for procfs to read the the stack,
> right?  Do you see another way to handle that case?

Well, we could use probe_kernel_text() and recheck tsk->stack != NULL
after this.

But,

> I'm thinking of adding:
>
> void *try_get_task_stack(struct task_struct *tsk);
> void put_task_stack(struct task_struct *tsk);

Yes, agreed, this looks better.

Oleg.



Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Oleg Nesterov
On 06/28, Linus Torvalds wrote:
>
> Then try_get_task_stack(tsk) becomes
>
>  void *try_get_task_stack(struct task_struct *tsk)
>  {
>   void *stack = tsk->stack;
>   if (!atomic_inc_not_zero(>stackref))
>stack = NULL;
>   return stack;
>  }

Yes, and then we can trivilly fix the users of to_live_kthread().
So I'll wait for this change and send the simple fix on top of it.

Otherwise I'll send another ugly hack (see below).

Oleg.
---

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index e691b6a..7667bc62 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -37,6 +37,7 @@ struct task_struct *kthread_create_on_cpu(int 
(*threadfn)(void *data),
__k;   \
 })
 
+void set_kthread_struct(void *kthread);
 void kthread_bind(struct task_struct *k, unsigned int cpu);
 void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask);
 int kthread_stop(struct task_struct *k);
diff --git a/kernel/fork.c b/kernel/fork.c
index 97122f9..8643248 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -326,6 +326,23 @@ void release_task_stack(struct task_struct *tsk)
 #endif
 }
 
+void set_kthread_struct(void *kthread)
+{
+   /*
+* This is the ugly but simple hack we will hopefully remove soon.
+* We abuse ->set_child_tid to avoid the new member and because it
+* can't be wrongly copied by copy_process(). We also rely on fact
+* that the caller can't exec, so PF_KTHREAD can't be cleared.
+*/
+   current->set_child_tid = (__force void __user *)kthread;
+}
+
+static void free_kthread_struct(struct task_struct *tsk)
+{
+   if (tsk->flags & PF_KTHREAD)
+   kfree((__force void *)tsk->set_child_tid); /* can be NULL */
+}
+
 void free_task(struct task_struct *tsk)
 {
 #ifndef CONFIG_THREAD_INFO_IN_TASK
@@ -345,6 +362,7 @@ void free_task(struct task_struct *tsk)
ftrace_graph_exit_task(tsk);
put_seccomp_filter(tsk);
arch_release_task_struct(tsk);
+   free_kthread_struct(tsk);
free_task_struct(tsk);
 }
 EXPORT_SYMBOL(free_task);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 9ff173d..3a4921f 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -181,14 +181,11 @@ static int kthread(void *_create)
int (*threadfn)(void *data) = create->threadfn;
void *data = create->data;
struct completion *done;
-   struct kthread self;
+   struct kthread *self;
int ret;
 
-   self.flags = 0;
-   self.data = data;
-   init_completion();
-   init_completion();
-   current->vfork_done = 
+   self = kmalloc(sizeof(*self), GFP_KERNEL);
+   set_kthread_struct(self);
 
/* If user was SIGKILLed, I release the structure. */
done = xchg(>done, NULL);
@@ -196,6 +193,19 @@ static int kthread(void *_create)
kfree(create);
do_exit(-EINTR);
}
+
+   if (!self) {
+   create->result = ERR_PTR(-ENOMEM);
+   complete(done);
+   do_exit(-ENOMEM);
+   }
+
+   self->flags = 0;
+   self->data = data;
+   init_completion(>exited);
+   init_completion(>parked);
+   current->vfork_done = >exited;
+
/* OK, tell user we're spawned, wait for stop or wakeup */
__set_current_state(TASK_UNINTERRUPTIBLE);
create->result = current;
@@ -203,12 +213,10 @@ static int kthread(void *_create)
schedule();
 
ret = -EINTR;
-
-   if (!test_bit(KTHREAD_SHOULD_STOP, )) {
-   __kthread_parkme();
+   if (!test_bit(KTHREAD_SHOULD_STOP, >flags)) {
+   __kthread_parkme(self);
ret = threadfn(data);
}
-   /* we can't just return, we must preserve "self" on stack */
do_exit(ret);
 }
 



Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Oleg Nesterov
On 06/28, Linus Torvalds wrote:
>
> Then try_get_task_stack(tsk) becomes
>
>  void *try_get_task_stack(struct task_struct *tsk)
>  {
>   void *stack = tsk->stack;
>   if (!atomic_inc_not_zero(>stackref))
>stack = NULL;
>   return stack;
>  }

Yes, and then we can trivilly fix the users of to_live_kthread().
So I'll wait for this change and send the simple fix on top of it.

Otherwise I'll send another ugly hack (see below).

Oleg.
---

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index e691b6a..7667bc62 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -37,6 +37,7 @@ struct task_struct *kthread_create_on_cpu(int 
(*threadfn)(void *data),
__k;   \
 })
 
+void set_kthread_struct(void *kthread);
 void kthread_bind(struct task_struct *k, unsigned int cpu);
 void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask);
 int kthread_stop(struct task_struct *k);
diff --git a/kernel/fork.c b/kernel/fork.c
index 97122f9..8643248 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -326,6 +326,23 @@ void release_task_stack(struct task_struct *tsk)
 #endif
 }
 
+void set_kthread_struct(void *kthread)
+{
+   /*
+* This is the ugly but simple hack we will hopefully remove soon.
+* We abuse ->set_child_tid to avoid the new member and because it
+* can't be wrongly copied by copy_process(). We also rely on fact
+* that the caller can't exec, so PF_KTHREAD can't be cleared.
+*/
+   current->set_child_tid = (__force void __user *)kthread;
+}
+
+static void free_kthread_struct(struct task_struct *tsk)
+{
+   if (tsk->flags & PF_KTHREAD)
+   kfree((__force void *)tsk->set_child_tid); /* can be NULL */
+}
+
 void free_task(struct task_struct *tsk)
 {
 #ifndef CONFIG_THREAD_INFO_IN_TASK
@@ -345,6 +362,7 @@ void free_task(struct task_struct *tsk)
ftrace_graph_exit_task(tsk);
put_seccomp_filter(tsk);
arch_release_task_struct(tsk);
+   free_kthread_struct(tsk);
free_task_struct(tsk);
 }
 EXPORT_SYMBOL(free_task);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 9ff173d..3a4921f 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -181,14 +181,11 @@ static int kthread(void *_create)
int (*threadfn)(void *data) = create->threadfn;
void *data = create->data;
struct completion *done;
-   struct kthread self;
+   struct kthread *self;
int ret;
 
-   self.flags = 0;
-   self.data = data;
-   init_completion();
-   init_completion();
-   current->vfork_done = 
+   self = kmalloc(sizeof(*self), GFP_KERNEL);
+   set_kthread_struct(self);
 
/* If user was SIGKILLed, I release the structure. */
done = xchg(>done, NULL);
@@ -196,6 +193,19 @@ static int kthread(void *_create)
kfree(create);
do_exit(-EINTR);
}
+
+   if (!self) {
+   create->result = ERR_PTR(-ENOMEM);
+   complete(done);
+   do_exit(-ENOMEM);
+   }
+
+   self->flags = 0;
+   self->data = data;
+   init_completion(>exited);
+   init_completion(>parked);
+   current->vfork_done = >exited;
+
/* OK, tell user we're spawned, wait for stop or wakeup */
__set_current_state(TASK_UNINTERRUPTIBLE);
create->result = current;
@@ -203,12 +213,10 @@ static int kthread(void *_create)
schedule();
 
ret = -EINTR;
-
-   if (!test_bit(KTHREAD_SHOULD_STOP, )) {
-   __kthread_parkme();
+   if (!test_bit(KTHREAD_SHOULD_STOP, >flags)) {
+   __kthread_parkme(self);
ret = threadfn(data);
}
-   /* we can't just return, we must preserve "self" on stack */
do_exit(ret);
 }
 



Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Linus Torvalds
On Tue, Jun 28, 2016 at 2:35 PM, Linus Torvalds
 wrote:
> On Tue, Jun 28, 2016 at 2:21 PM, Andy Lutomirski  wrote:
>
>> If so, that seems considerably more complicated than just adding a reference 
>> count.
>
> Fair enough.

Ahh, and if you put the reference count just in the task_struct (next
to the ->stack pointer), then I guess that's particularly trivial.

Then try_get_task_stack(tsk) becomes

 void *try_get_task_stack(struct task_struct *tsk)
 {
  void *stack = tsk->stack;
  if (!atomic_inc_not_zero(>stackref))
   stack = NULL;
  return stack;
 }

ok, color me convinced.

   Linus


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Linus Torvalds
On Tue, Jun 28, 2016 at 2:35 PM, Linus Torvalds
 wrote:
> On Tue, Jun 28, 2016 at 2:21 PM, Andy Lutomirski  wrote:
>
>> If so, that seems considerably more complicated than just adding a reference 
>> count.
>
> Fair enough.

Ahh, and if you put the reference count just in the task_struct (next
to the ->stack pointer), then I guess that's particularly trivial.

Then try_get_task_stack(tsk) becomes

 void *try_get_task_stack(struct task_struct *tsk)
 {
  void *stack = tsk->stack;
  if (!atomic_inc_not_zero(>stackref))
   stack = NULL;
  return stack;
 }

ok, color me convinced.

   Linus


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Linus Torvalds
On Tue, Jun 28, 2016 at 2:21 PM, Andy Lutomirski  wrote:
>
>  Or are you
> suggesting that we actually make a list somewhere of stacks that are
> nominally unused but are still around for RCU's benefit and then
> scavenge from that lest when we need a new stack?

Yes. We'd have to make our own list (rather than just use the RCU list
itself where everything goes), but..

> If so, that seems considerably more complicated than just adding a reference 
> count.

Fair enough.

 Linus


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Linus Torvalds
On Tue, Jun 28, 2016 at 2:21 PM, Andy Lutomirski  wrote:
>
>  Or are you
> suggesting that we actually make a list somewhere of stacks that are
> nominally unused but are still around for RCU's benefit and then
> scavenge from that lest when we need a new stack?

Yes. We'd have to make our own list (rather than just use the RCU list
itself where everything goes), but..

> If so, that seems considerably more complicated than just adding a reference 
> count.

Fair enough.

 Linus


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 2:14 PM, Linus Torvalds
 wrote:
> On Tue, Jun 28, 2016 at 1:54 PM, Andy Lutomirski  wrote:
>>
>> But I might need to that anyway for procfs to read the the stack,
>> right?  Do you see another way to handle that case?
>
> I think the other way to handle the kernel stack reading would be to
> simply make the stack freeing be RCU-delayed, and use the RCU list
> itself as the stack cache.
>
> That way reading the stack is ok in a RCU context, although you might
> end up reading a stack that has been re-used.
>
> Would that work for people?

I don't think I understand your proposal.  We already delay freeing
the stack for RCU.  If we continue doing it, then, under workloads
like my benchmark, the RCU list gets quite large.  Or are you
suggesting that we actually make a list somewhere of stacks that are
nominally unused but are still around for RCU's benefit and then
scavenge from that lest when we need a new stack?  If so, that seems
considerably more complicated than just adding a reference count.

Also, my inner security nerd says that letting /proc potentially read
the wrong process's stack is bad news.


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 2:14 PM, Linus Torvalds
 wrote:
> On Tue, Jun 28, 2016 at 1:54 PM, Andy Lutomirski  wrote:
>>
>> But I might need to that anyway for procfs to read the the stack,
>> right?  Do you see another way to handle that case?
>
> I think the other way to handle the kernel stack reading would be to
> simply make the stack freeing be RCU-delayed, and use the RCU list
> itself as the stack cache.
>
> That way reading the stack is ok in a RCU context, although you might
> end up reading a stack that has been re-used.
>
> Would that work for people?

I don't think I understand your proposal.  We already delay freeing
the stack for RCU.  If we continue doing it, then, under workloads
like my benchmark, the RCU list gets quite large.  Or are you
suggesting that we actually make a list somewhere of stacks that are
nominally unused but are still around for RCU's benefit and then
scavenge from that lest when we need a new stack?  If so, that seems
considerably more complicated than just adding a reference count.

Also, my inner security nerd says that letting /proc potentially read
the wrong process's stack is bad news.


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Linus Torvalds
On Tue, Jun 28, 2016 at 2:14 PM, Linus Torvalds
 wrote:
>
> I think the other way to handle the kernel stack reading would be to
> simply make the stack freeing be RCU-delayed, and use the RCU list
> itself as the stack cache.

That said, if you end up having to have a stack reference count for
other reasons anyway, then I guess that creating a
try_get_task_stack()/put_task_stack() model is fine.

The RCU thing would only work for pure readers that can handle stale
data (which would be the stack trace thing). It wouldn't work for the
kthread to_live_kthread() case.

   Linus


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Linus Torvalds
On Tue, Jun 28, 2016 at 2:14 PM, Linus Torvalds
 wrote:
>
> I think the other way to handle the kernel stack reading would be to
> simply make the stack freeing be RCU-delayed, and use the RCU list
> itself as the stack cache.

That said, if you end up having to have a stack reference count for
other reasons anyway, then I guess that creating a
try_get_task_stack()/put_task_stack() model is fine.

The RCU thing would only work for pure readers that can handle stale
data (which would be the stack trace thing). It wouldn't work for the
kthread to_live_kthread() case.

   Linus


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Linus Torvalds
On Tue, Jun 28, 2016 at 1:54 PM, Andy Lutomirski  wrote:
>
> But I might need to that anyway for procfs to read the the stack,
> right?  Do you see another way to handle that case?

I think the other way to handle the kernel stack reading would be to
simply make the stack freeing be RCU-delayed, and use the RCU list
itself as the stack cache.

That way reading the stack is ok in a RCU context, although you might
end up reading a stack that has been re-used.

Would that work for people?

Linus


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Linus Torvalds
On Tue, Jun 28, 2016 at 1:54 PM, Andy Lutomirski  wrote:
>
> But I might need to that anyway for procfs to read the the stack,
> right?  Do you see another way to handle that case?

I think the other way to handle the kernel stack reading would be to
simply make the stack freeing be RCU-delayed, and use the RCU list
itself as the stack cache.

That way reading the stack is ok in a RCU context, although you might
end up reading a stack that has been re-used.

Would that work for people?

Linus


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 1:12 PM, Oleg Nesterov  wrote:
> On 06/28, Andy Lutomirski wrote:
>>
>> On Tue, Jun 28, 2016 at 11:58 AM, Oleg Nesterov  wrote:
>> >
>> > Then how (say) proc_pid_stack() can work? If it hits the task which is
>> > alreay dead we are (probably) fine, valid_stack_ptr() should fail iiuc.
>> >
>> > But what if we race with the last schedule() ? "addr = *stack" can read
>> > the already vfree'ed memory, no?
>> >
>> > Looks like print_context_stack/etc need probe_kernel_address or I missed
>> > something.
>>
>> Yuck.  I suppose I could add a reference count to protect the stack.
>> Would that simplify the kthread code?
>
> Well yes, that is why I asked. So please tell me if you are going to
> do this...
>
> But we can fix kthread code without this hack which we do not need in
> the long term anyway. Unfortunaly we need to cleanup kernel/smpboot.c
> first. And I was going to do this a long ago for quite different reason ;)
>
> So please forget unless you see another reason for this change.
>

But I might need to that anyway for procfs to read the the stack,
right?  Do you see another way to handle that case?

I'm thinking of adding:

void *try_get_task_stack(struct task_struct *tsk);
void put_task_stack(struct task_struct *tsk);

where try_get_task_stack can return NULL.

--Andy


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 1:12 PM, Oleg Nesterov  wrote:
> On 06/28, Andy Lutomirski wrote:
>>
>> On Tue, Jun 28, 2016 at 11:58 AM, Oleg Nesterov  wrote:
>> >
>> > Then how (say) proc_pid_stack() can work? If it hits the task which is
>> > alreay dead we are (probably) fine, valid_stack_ptr() should fail iiuc.
>> >
>> > But what if we race with the last schedule() ? "addr = *stack" can read
>> > the already vfree'ed memory, no?
>> >
>> > Looks like print_context_stack/etc need probe_kernel_address or I missed
>> > something.
>>
>> Yuck.  I suppose I could add a reference count to protect the stack.
>> Would that simplify the kthread code?
>
> Well yes, that is why I asked. So please tell me if you are going to
> do this...
>
> But we can fix kthread code without this hack which we do not need in
> the long term anyway. Unfortunaly we need to cleanup kernel/smpboot.c
> first. And I was going to do this a long ago for quite different reason ;)
>
> So please forget unless you see another reason for this change.
>

But I might need to that anyway for procfs to read the the stack,
right?  Do you see another way to handle that case?

I'm thinking of adding:

void *try_get_task_stack(struct task_struct *tsk);
void put_task_stack(struct task_struct *tsk);

where try_get_task_stack can return NULL.

--Andy


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Oleg Nesterov
On 06/28, Andy Lutomirski wrote:
>
> On Tue, Jun 28, 2016 at 11:58 AM, Oleg Nesterov  wrote:
> >
> > Then how (say) proc_pid_stack() can work? If it hits the task which is
> > alreay dead we are (probably) fine, valid_stack_ptr() should fail iiuc.
> >
> > But what if we race with the last schedule() ? "addr = *stack" can read
> > the already vfree'ed memory, no?
> >
> > Looks like print_context_stack/etc need probe_kernel_address or I missed
> > something.
>
> Yuck.  I suppose I could add a reference count to protect the stack.
> Would that simplify the kthread code?

Well yes, that is why I asked. So please tell me if you are going to
do this...

But we can fix kthread code without this hack which we do not need in
the long term anyway. Unfortunaly we need to cleanup kernel/smpboot.c
first. And I was going to do this a long ago for quite different reason ;)

So please forget unless you see another reason for this change.

Oleg.



Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Oleg Nesterov
On 06/28, Andy Lutomirski wrote:
>
> On Tue, Jun 28, 2016 at 11:58 AM, Oleg Nesterov  wrote:
> >
> > Then how (say) proc_pid_stack() can work? If it hits the task which is
> > alreay dead we are (probably) fine, valid_stack_ptr() should fail iiuc.
> >
> > But what if we race with the last schedule() ? "addr = *stack" can read
> > the already vfree'ed memory, no?
> >
> > Looks like print_context_stack/etc need probe_kernel_address or I missed
> > something.
>
> Yuck.  I suppose I could add a reference count to protect the stack.
> Would that simplify the kthread code?

Well yes, that is why I asked. So please tell me if you are going to
do this...

But we can fix kthread code without this hack which we do not need in
the long term anyway. Unfortunaly we need to cleanup kernel/smpboot.c
first. And I was going to do this a long ago for quite different reason ;)

So please forget unless you see another reason for this change.

Oleg.



Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 11:58 AM, Oleg Nesterov  wrote:
> On 06/27, Oleg Nesterov wrote:
>>
>> On 06/27, Andy Lutomirski wrote:
>> >
>> > Want to send a patch?  I could do it, but you understand this code
>> > much better than I do.
>>
>> Well, I'll try to do this tomorrow unless you do it.
>
> I have cloned luto/linux.git to see if kthread_stop() can pin ->stack
> somehow, but it seems this is not possible, finish_task_switch() does
> free_thread_stack() unconditionally.
>
> Then how (say) proc_pid_stack() can work? If it hits the task which is
> alreay dead we are (probably) fine, valid_stack_ptr() should fail iiuc.
>
> But what if we race with the last schedule() ? "addr = *stack" can read
> the already vfree'ed memory, no?
>
> Looks like print_context_stack/etc need probe_kernel_address or I missed
> something.

Yuck.  I suppose I could add a reference count to protect the stack.
Would that simplify the kthread code?

It's too bad that all the kthread users use get_task_struct instead
of, say get_kthread (which doesn't exist).

--Andy

>
> Oleg.
>



-- 
Andy Lutomirski
AMA Capital Management, LLC


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Andy Lutomirski
On Tue, Jun 28, 2016 at 11:58 AM, Oleg Nesterov  wrote:
> On 06/27, Oleg Nesterov wrote:
>>
>> On 06/27, Andy Lutomirski wrote:
>> >
>> > Want to send a patch?  I could do it, but you understand this code
>> > much better than I do.
>>
>> Well, I'll try to do this tomorrow unless you do it.
>
> I have cloned luto/linux.git to see if kthread_stop() can pin ->stack
> somehow, but it seems this is not possible, finish_task_switch() does
> free_thread_stack() unconditionally.
>
> Then how (say) proc_pid_stack() can work? If it hits the task which is
> alreay dead we are (probably) fine, valid_stack_ptr() should fail iiuc.
>
> But what if we race with the last schedule() ? "addr = *stack" can read
> the already vfree'ed memory, no?
>
> Looks like print_context_stack/etc need probe_kernel_address or I missed
> something.

Yuck.  I suppose I could add a reference count to protect the stack.
Would that simplify the kthread code?

It's too bad that all the kthread users use get_task_struct instead
of, say get_kthread (which doesn't exist).

--Andy

>
> Oleg.
>



-- 
Andy Lutomirski
AMA Capital Management, LLC


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Oleg Nesterov
On 06/27, Oleg Nesterov wrote:
>
> On 06/27, Andy Lutomirski wrote:
> >
> > Want to send a patch?  I could do it, but you understand this code
> > much better than I do.
>
> Well, I'll try to do this tomorrow unless you do it.

I have cloned luto/linux.git to see if kthread_stop() can pin ->stack
somehow, but it seems this is not possible, finish_task_switch() does
free_thread_stack() unconditionally.

Then how (say) proc_pid_stack() can work? If it hits the task which is
alreay dead we are (probably) fine, valid_stack_ptr() should fail iiuc.

But what if we race with the last schedule() ? "addr = *stack" can read
the already vfree'ed memory, no?

Looks like print_context_stack/etc need probe_kernel_address or I missed
something.

Oleg.



Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-28 Thread Oleg Nesterov
On 06/27, Oleg Nesterov wrote:
>
> On 06/27, Andy Lutomirski wrote:
> >
> > Want to send a patch?  I could do it, but you understand this code
> > much better than I do.
>
> Well, I'll try to do this tomorrow unless you do it.

I have cloned luto/linux.git to see if kthread_stop() can pin ->stack
somehow, but it seems this is not possible, finish_task_switch() does
free_thread_stack() unconditionally.

Then how (say) proc_pid_stack() can work? If it hits the task which is
alreay dead we are (probably) fine, valid_stack_ptr() should fail iiuc.

But what if we race with the last schedule() ? "addr = *stack" can read
the already vfree'ed memory, no?

Looks like print_context_stack/etc need probe_kernel_address or I missed
something.

Oleg.



Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Linus Torvalds
On Sun, Jun 26, 2016 at 10:22 PM, Andy Lutomirski  wrote:
>
> kthread_stop is *sick*.
>
> struct kthread self;
>
>
> current->vfork_done = 
>
>
> do_exit(ret);
>
> And then some other thread goes and waits for the completion, which is
> *on the stack*, which, in any sane world (e.g. with my series
> applied), is long gone by then.

Yeah. To be fair, that used to work. And the waiter does actually get
a reference to the task struct, and with the lifetime of the stack
historically being the same as the task struct, it was even being
fairly careful about it.

But yes, it's disgusting, and doesn't work in the new world order, and
I think it should be fairly easy to fix. Although getting the lifetime
right for a separately allocated "struct kthread_struct" might be a
bit exciting.

 Linus


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Linus Torvalds
On Sun, Jun 26, 2016 at 10:22 PM, Andy Lutomirski  wrote:
>
> kthread_stop is *sick*.
>
> struct kthread self;
>
>
> current->vfork_done = 
>
>
> do_exit(ret);
>
> And then some other thread goes and waits for the completion, which is
> *on the stack*, which, in any sane world (e.g. with my series
> applied), is long gone by then.

Yeah. To be fair, that used to work. And the waiter does actually get
a reference to the task struct, and with the lifetime of the stack
historically being the same as the task struct, it was even being
fairly careful about it.

But yes, it's disgusting, and doesn't work in the new world order, and
I think it should be fairly easy to fix. Although getting the lifetime
right for a separately allocated "struct kthread_struct" might be a
bit exciting.

 Linus


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Oleg Nesterov
On 06/27, Andy Lutomirski wrote:
>
> On Mon, Jun 27, 2016 at 7:54 AM, Oleg Nesterov  wrote:
> >
> >> Is there seriously no way to directly wait for a struct task_struct to
> >> exit?  Could we, say, kmalloc the completion (or maybe even the whole
> >> struct kthread) and (ick!) hang it off ->vfork_done?
> >
> > Sure we can... And yes, I think we need to alloc the whole struct kthread.
> > Just another (unfortunate) complication, the current code is simple.
> >
> > And probably kthread/kthread_stop should switch to task_work_exit().
>
> Want to send a patch?  I could do it, but you understand this code
> much better than I do.

Well, I'll try to do this tomorrow unless you do it.

The problem is not the wait_for_completion(vfork_done) in kthread_stop(),
we can avoid this immediately if we change it to use task_work_add().

The problem is to_live_kthread(). And damn, it seems to me that in the
long term we can simpy kill "struct kthread" altogether. All we need is
kthread_data() and this is just a pointer. flags,cpu,parked should go
into smp_hotplug_thread.

But this needs changes, so meanwhile we will have to kmalloc() it and
free in free_task().

Or perhaps you can simply move "struct kthread" into task_struct as as
temporary/ugly but trivial fix, then we can think more.

Oleg.



Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Oleg Nesterov
On 06/27, Andy Lutomirski wrote:
>
> On Mon, Jun 27, 2016 at 7:54 AM, Oleg Nesterov  wrote:
> >
> >> Is there seriously no way to directly wait for a struct task_struct to
> >> exit?  Could we, say, kmalloc the completion (or maybe even the whole
> >> struct kthread) and (ick!) hang it off ->vfork_done?
> >
> > Sure we can... And yes, I think we need to alloc the whole struct kthread.
> > Just another (unfortunate) complication, the current code is simple.
> >
> > And probably kthread/kthread_stop should switch to task_work_exit().
>
> Want to send a patch?  I could do it, but you understand this code
> much better than I do.

Well, I'll try to do this tomorrow unless you do it.

The problem is not the wait_for_completion(vfork_done) in kthread_stop(),
we can avoid this immediately if we change it to use task_work_add().

The problem is to_live_kthread(). And damn, it seems to me that in the
long term we can simpy kill "struct kthread" altogether. All we need is
kthread_data() and this is just a pointer. flags,cpu,parked should go
into smp_hotplug_thread.

But this needs changes, so meanwhile we will have to kmalloc() it and
free in free_task().

Or perhaps you can simply move "struct kthread" into task_struct as as
temporary/ugly but trivial fix, then we can think more.

Oleg.



Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Andy Lutomirski
On Mon, Jun 27, 2016 at 7:54 AM, Oleg Nesterov  wrote:
> On 06/26, Andy Lutomirski wrote:
>>
>> kthread_stop is *sick*.
>>
>> struct kthread self;
>>
>> ...
>>
>> current->vfork_done = 
>>
>> ...
>>
>> do_exit(ret);
>>
>> And then some other thread goes and waits for the completion, which is
>> *on the stack*, which, in any sane world (e.g. with my series
>> applied), is long gone by then.
>
> Yes, I forgot this when we discussed the problems with ti->flags/etc...
>
>> But this is broken even without any changes: since when is gcc
>> guaranteed to preserve the stack contents when a function ends with a
>> sibling call, let alone with a __noreturn call?
>
> I don't know if gcc can actually drop the stack frame in this case,
> but even if it can this looks fixeable.
>
>> Is there seriously no way to directly wait for a struct task_struct to
>> exit?  Could we, say, kmalloc the completion (or maybe even the whole
>> struct kthread) and (ick!) hang it off ->vfork_done?
>
> Sure we can... And yes, I think we need to alloc the whole struct kthread.
> Just another (unfortunate) complication, the current code is simple.
>
> And probably kthread/kthread_stop should switch to task_work_exit().

Want to send a patch?  I could do it, but you understand this code
much better than I do.


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Andy Lutomirski
On Mon, Jun 27, 2016 at 7:54 AM, Oleg Nesterov  wrote:
> On 06/26, Andy Lutomirski wrote:
>>
>> kthread_stop is *sick*.
>>
>> struct kthread self;
>>
>> ...
>>
>> current->vfork_done = 
>>
>> ...
>>
>> do_exit(ret);
>>
>> And then some other thread goes and waits for the completion, which is
>> *on the stack*, which, in any sane world (e.g. with my series
>> applied), is long gone by then.
>
> Yes, I forgot this when we discussed the problems with ti->flags/etc...
>
>> But this is broken even without any changes: since when is gcc
>> guaranteed to preserve the stack contents when a function ends with a
>> sibling call, let alone with a __noreturn call?
>
> I don't know if gcc can actually drop the stack frame in this case,
> but even if it can this looks fixeable.
>
>> Is there seriously no way to directly wait for a struct task_struct to
>> exit?  Could we, say, kmalloc the completion (or maybe even the whole
>> struct kthread) and (ick!) hang it off ->vfork_done?
>
> Sure we can... And yes, I think we need to alloc the whole struct kthread.
> Just another (unfortunate) complication, the current code is simple.
>
> And probably kthread/kthread_stop should switch to task_work_exit().

Want to send a patch?  I could do it, but you understand this code
much better than I do.


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Oleg Nesterov
On 06/26, Andy Lutomirski wrote:
>
> kthread_stop is *sick*.
>
> struct kthread self;
>
> ...
>
> current->vfork_done = 
>
> ...
>
> do_exit(ret);
>
> And then some other thread goes and waits for the completion, which is
> *on the stack*, which, in any sane world (e.g. with my series
> applied), is long gone by then.

Yes, I forgot this when we discussed the problems with ti->flags/etc...

> But this is broken even without any changes: since when is gcc
> guaranteed to preserve the stack contents when a function ends with a
> sibling call, let alone with a __noreturn call?

I don't know if gcc can actually drop the stack frame in this case,
but even if it can this looks fixeable.

> Is there seriously no way to directly wait for a struct task_struct to
> exit?  Could we, say, kmalloc the completion (or maybe even the whole
> struct kthread) and (ick!) hang it off ->vfork_done?

Sure we can... And yes, I think we need to alloc the whole struct kthread.
Just another (unfortunate) complication, the current code is simple.

And probably kthread/kthread_stop should switch to task_work_exit().

Oleg.



Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Oleg Nesterov
On 06/26, Andy Lutomirski wrote:
>
> kthread_stop is *sick*.
>
> struct kthread self;
>
> ...
>
> current->vfork_done = 
>
> ...
>
> do_exit(ret);
>
> And then some other thread goes and waits for the completion, which is
> *on the stack*, which, in any sane world (e.g. with my series
> applied), is long gone by then.

Yes, I forgot this when we discussed the problems with ti->flags/etc...

> But this is broken even without any changes: since when is gcc
> guaranteed to preserve the stack contents when a function ends with a
> sibling call, let alone with a __noreturn call?

I don't know if gcc can actually drop the stack frame in this case,
but even if it can this looks fixeable.

> Is there seriously no way to directly wait for a struct task_struct to
> exit?  Could we, say, kmalloc the completion (or maybe even the whole
> struct kthread) and (ick!) hang it off ->vfork_done?

Sure we can... And yes, I think we need to alloc the whole struct kthread.
Just another (unfortunate) complication, the current code is simple.

And probably kthread/kthread_stop should switch to task_work_exit().

Oleg.



Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Peter Zijlstra
On Sun, Jun 26, 2016 at 10:22:32PM -0700, Andy Lutomirski wrote:
> kthread_stop is *sick*.
> 
> struct kthread self;
> 
> ...
> 
> current->vfork_done = 
> 
> ...
> 
> do_exit(ret);
> 
> And then some other thread goes and waits for the completion, which is
> *on the stack*, which, in any sane world (e.g. with my series
> applied), is long gone by then.

cute

> Is there seriously no way to directly wait for a struct task_struct to
> exit?  Could we, say, kmalloc the completion (or maybe even the whole
> struct kthread)

I suppose the easiest is to merge struct kthread into struct
kthread_create_info, and maybe union some bits if we're really worried
about size.

> and (ick!) hang it off ->vfork_done?

This... cleanup is going to be somewhat icky.


Re: kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-27 Thread Peter Zijlstra
On Sun, Jun 26, 2016 at 10:22:32PM -0700, Andy Lutomirski wrote:
> kthread_stop is *sick*.
> 
> struct kthread self;
> 
> ...
> 
> current->vfork_done = 
> 
> ...
> 
> do_exit(ret);
> 
> And then some other thread goes and waits for the completion, which is
> *on the stack*, which, in any sane world (e.g. with my series
> applied), is long gone by then.

cute

> Is there seriously no way to directly wait for a struct task_struct to
> exit?  Could we, say, kmalloc the completion (or maybe even the whole
> struct kthread)

I suppose the easiest is to merge struct kthread into struct
kthread_create_info, and maybe union some bits if we're really worried
about size.

> and (ick!) hang it off ->vfork_done?

This... cleanup is going to be somewhat icky.


kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-26 Thread Andy Lutomirski
My v4 series was doing pretty well until this explosion:

On Sun, Jun 26, 2016 at 9:41 PM, kernel test robot
 wrote:
>
>
> FYI, we noticed the following commit:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git x86/vmap_stack
> commit 26424589626d7f82d09d4e7c0569f9487b2e810a ("[DEBUG] force-enable 
> CONFIG_VMAP_STACK")
>

...

> [4.425052] BUG: unable to handle kernel paging request at c9997f18
> [4.426645] IP: [] _raw_spin_lock_irq+0x2c/0x3d
> [4.427869] PGD 1249e067 PUD 1249f067 PMD 11e4e067 PTE 0
> [4.429245] Oops: 0002 [#1] SMP
> [4.430086] Modules linked in:
> [4.430992] CPU: 0 PID: 1741 Comm: mount Not tainted 
> 4.7.0-rc4-00258-g26424589 #1
> [4.432727] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Debian-1.8.2-1 04/01/2014
> [4.434646] task: 88000d950c80 ti: 88000d950c80 task.ti: 
> 88000d950c80

Yeah, this line is meaningless with the thread_info cleanups, and I
have it fixed for v5.

> [4.436406] RIP: 0010:[]  [] 
> _raw_spin_lock_irq+0x2c/0x3d
> [4.438341] RSP: 0018:c9957c80  EFLAGS: 00010046
> [4.439438] RAX:  RBX: 7fff RCX: 
> 0a66
> [4.440735] RDX: 0001 RSI: 880013619bc0 RDI: 
> c9997f18
> [4.442035] RBP: c9957c88 R08: 00019bc0 R09: 
> 81200748
> [4.443323] R10: ea474900 R11: 0001a2a0 R12: 
> c9997f10
> [4.444614] R13: 0002 R14: c9997f18 R15: 
> ffea
> [4.445896] FS:  7f9ca6a32700() GS:88001360() 
> knlGS:
> [4.447690] CS:  0010 DS:  ES:  CR0: 80050033
> [4.448819] CR2: c9997f18 CR3: 0d87c000 CR4: 
> 06f0
> [4.450102] Stack:
> [4.450810]  c9997f18 c9957d00 81a982eb 
> 0246
> [4.452827]   c9957d00 8112584b 
> 
> [4.454838]  0246 88000e27f6bc  
> 88000e27f080
> [4.456845] Call Trace:
> [4.457616]  [] wait_for_common+0x44/0x197
> [4.458719]  [] ? try_to_wake_up+0x2dd/0x2ef
> [4.459877]  [] wait_for_completion+0x1d/0x1f
> [4.461027]  [] kthread_stop+0x82/0x10a
> [4.462125]  [] destroy_workqueue+0x10d/0x1cd
> [4.463347]  [] xfs_destroy_mount_workqueues+0x49/0x64
> [4.464620]  [] xfs_fs_fill_super+0x2c0/0x49c
> [4.465807]  [] mount_bdev+0x143/0x195
> [4.466937]  [] ? xfs_test_remount_options+0x5b/0x5b
> [4.468727]  [] xfs_fs_mount+0x15/0x17
> [4.469838]  [] mount_fs+0x15/0x8c
> [4.470882]  [] vfs_kern_mount+0x6a/0xfe
> [4.472005]  [] do_mount+0x985/0xa9a
> [4.473078]  [] ? strndup_user+0x3a/0x6a
> [4.474193]  [] SyS_mount+0x77/0x9f
> [4.475255]  [] entry_SYSCALL_64_fastpath+0x1f/0xbd
> [4.476463] Code: 66 66 66 90 55 48 89 e5 50 48 89 7d f8 fa 66 66 90 66 66 
> 90 e8 2d 0a 70 ff 65 ff 05 73 18 57 7e 31 c0 ba 01 00 00 00 48 8b 7d f8  
> 0f b1 17 85 c0 74 07 89 c6 e8 3e 20 6a ff c9 c3 66 66 66 66
> [4.484413] RIP  [] _raw_spin_lock_irq+0x2c/0x3d
> [4.485639]  RSP 
> [4.486509] CR2: c9997f18
> [4.487366] ---[ end trace 79763b41869f2580 ]---
> [4.488367] Kernel panic - not syncing: Fatal exception
>

kthread_stop is *sick*.

struct kthread self;

...

current->vfork_done = 

...

do_exit(ret);

And then some other thread goes and waits for the completion, which is
*on the stack*, which, in any sane world (e.g. with my series
applied), is long gone by then.

But this is broken even without any changes: since when is gcc
guaranteed to preserve the stack contents when a function ends with a
sibling call, let alone with a __noreturn call?

Is there seriously no way to directly wait for a struct task_struct to
exit?  Could we, say, kmalloc the completion (or maybe even the whole
struct kthread) and (ick!) hang it off ->vfork_done?

Linus, maybe it's time for you to carve another wax figurine.

--Andy


kthread_stop insanity (Re: [[DEBUG] force] 2642458962: BUG: unable to handle kernel paging request at ffffc90000997f18)

2016-06-26 Thread Andy Lutomirski
My v4 series was doing pretty well until this explosion:

On Sun, Jun 26, 2016 at 9:41 PM, kernel test robot
 wrote:
>
>
> FYI, we noticed the following commit:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git x86/vmap_stack
> commit 26424589626d7f82d09d4e7c0569f9487b2e810a ("[DEBUG] force-enable 
> CONFIG_VMAP_STACK")
>

...

> [4.425052] BUG: unable to handle kernel paging request at c9997f18
> [4.426645] IP: [] _raw_spin_lock_irq+0x2c/0x3d
> [4.427869] PGD 1249e067 PUD 1249f067 PMD 11e4e067 PTE 0
> [4.429245] Oops: 0002 [#1] SMP
> [4.430086] Modules linked in:
> [4.430992] CPU: 0 PID: 1741 Comm: mount Not tainted 
> 4.7.0-rc4-00258-g26424589 #1
> [4.432727] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Debian-1.8.2-1 04/01/2014
> [4.434646] task: 88000d950c80 ti: 88000d950c80 task.ti: 
> 88000d950c80

Yeah, this line is meaningless with the thread_info cleanups, and I
have it fixed for v5.

> [4.436406] RIP: 0010:[]  [] 
> _raw_spin_lock_irq+0x2c/0x3d
> [4.438341] RSP: 0018:c9957c80  EFLAGS: 00010046
> [4.439438] RAX:  RBX: 7fff RCX: 
> 0a66
> [4.440735] RDX: 0001 RSI: 880013619bc0 RDI: 
> c9997f18
> [4.442035] RBP: c9957c88 R08: 00019bc0 R09: 
> 81200748
> [4.443323] R10: ea474900 R11: 0001a2a0 R12: 
> c9997f10
> [4.444614] R13: 0002 R14: c9997f18 R15: 
> ffea
> [4.445896] FS:  7f9ca6a32700() GS:88001360() 
> knlGS:
> [4.447690] CS:  0010 DS:  ES:  CR0: 80050033
> [4.448819] CR2: c9997f18 CR3: 0d87c000 CR4: 
> 06f0
> [4.450102] Stack:
> [4.450810]  c9997f18 c9957d00 81a982eb 
> 0246
> [4.452827]   c9957d00 8112584b 
> 
> [4.454838]  0246 88000e27f6bc  
> 88000e27f080
> [4.456845] Call Trace:
> [4.457616]  [] wait_for_common+0x44/0x197
> [4.458719]  [] ? try_to_wake_up+0x2dd/0x2ef
> [4.459877]  [] wait_for_completion+0x1d/0x1f
> [4.461027]  [] kthread_stop+0x82/0x10a
> [4.462125]  [] destroy_workqueue+0x10d/0x1cd
> [4.463347]  [] xfs_destroy_mount_workqueues+0x49/0x64
> [4.464620]  [] xfs_fs_fill_super+0x2c0/0x49c
> [4.465807]  [] mount_bdev+0x143/0x195
> [4.466937]  [] ? xfs_test_remount_options+0x5b/0x5b
> [4.468727]  [] xfs_fs_mount+0x15/0x17
> [4.469838]  [] mount_fs+0x15/0x8c
> [4.470882]  [] vfs_kern_mount+0x6a/0xfe
> [4.472005]  [] do_mount+0x985/0xa9a
> [4.473078]  [] ? strndup_user+0x3a/0x6a
> [4.474193]  [] SyS_mount+0x77/0x9f
> [4.475255]  [] entry_SYSCALL_64_fastpath+0x1f/0xbd
> [4.476463] Code: 66 66 66 90 55 48 89 e5 50 48 89 7d f8 fa 66 66 90 66 66 
> 90 e8 2d 0a 70 ff 65 ff 05 73 18 57 7e 31 c0 ba 01 00 00 00 48 8b 7d f8  
> 0f b1 17 85 c0 74 07 89 c6 e8 3e 20 6a ff c9 c3 66 66 66 66
> [4.484413] RIP  [] _raw_spin_lock_irq+0x2c/0x3d
> [4.485639]  RSP 
> [4.486509] CR2: c9997f18
> [4.487366] ---[ end trace 79763b41869f2580 ]---
> [4.488367] Kernel panic - not syncing: Fatal exception
>

kthread_stop is *sick*.

struct kthread self;

...

current->vfork_done = 

...

do_exit(ret);

And then some other thread goes and waits for the completion, which is
*on the stack*, which, in any sane world (e.g. with my series
applied), is long gone by then.

But this is broken even without any changes: since when is gcc
guaranteed to preserve the stack contents when a function ends with a
sibling call, let alone with a __noreturn call?

Is there seriously no way to directly wait for a struct task_struct to
exit?  Could we, say, kmalloc the completion (or maybe even the whole
struct kthread) and (ick!) hang it off ->vfork_done?

Linus, maybe it's time for you to carve another wax figurine.

--Andy