Re: [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec

2017-04-06 Thread Oleg Nesterov
On 04/05, Eric W. Biederman wrote:
>
> Oleg Nesterov  writes:
>
> >> --- a/kernel/signal.c
> >> +++ b/kernel/signal.c
> >> @@ -995,6 +995,10 @@ static int __send_signal(int sig, struct siginfo 
> >> *info, struct task_struct *t,
> >>from_ancestor_ns || (info == SEND_SIG_FORCED)))
> >>goto ret;
> >>
> >> +  /* Don't allow thread group signals after exec */
> >> +  if (group && (t->signal->exec_id != t->self_exec_id))
> >> +  goto ret;
> >
> > Hmm. Either we do not need this exec_id check at all, or we should not
> > take "group" into account; a fatal signal (say SIGKILL) will kill the
> > whole thread-group.
>
> Wow.  Those are crazy semantics for fatal signals.  Sending a tkill
> should not affect the entire thread group.

How so? SIGKILL or any fatal signal should kill the whole process, even if
it was sent by tkill().

> Oleg I think this is a bug
> you introduced and likely requires a separate fix.
>
> I really don't understand the logic in:
>
> commit 5fcd835bf8c2cde06404559b1904e2f1dfcb4567
> Author: Oleg Nesterov 
> Date:   Wed Apr 30 00:52:55 2008 -0700
>
> signals: use __group_complete_signal() for the specific signals too

No. You can even forget about "send" path for the moment. Just suppose that
a thread dequeues SIGKILL sent by tkill(). In this case it will call
do_group_exit() and kill the group anyway. It is not possible to kill an
individual thread, and linux never did this.

Afaics, this commit also fixes the case when SIGKILL can be lost when tkill()
races with the exiting target. Or if the target is a zombie-leader. Exactly
because they obviously can't dequeue SIGKILL.

Plus we want to shutdown the whole thread-group "asap", that is why
complete_signal() sets SIGNAL_GROUP_EXIT and sends SIGKILL to other threads
in the "send" path.

This btw reminds me that we want to do the same with sig_kernel_coredump()
signals too, but this is not simple.

> >> @@ -1247,7 +1251,8 @@ struct sighand_struct *__lock_task_sighand(struct 
> >> task_struct *tsk,
> >> * must see ->sighand == NULL.
> >> */
> >>spin_lock(>siglock);
> >> -  if (likely(sighand == tsk->sighand)) {
> >> +  if (likely((sighand == tsk->sighand) &&
> >> + (tsk->self_exec_id == tsk->signal->exec_id))) {
> >
> > Oh, this doesn't look good to me. Yes, with your approach we probably need
> > this to, say, ensure that posix-cpu-timer can't kill the process after exec,
> > but I'd rather add the exit_state check into run_posix_timers().
>
> The entire point of lock_task_sighand is to not operate on
> tasks/processes that have exited.

Well, the entire point of lock_task_sighand() is take ->siglock if possible.

> The fact it even sighand in there is
> deceptive because it is all about siglock and nothing to do with
> sighand.

Not sure I understand what you mean...

Yes, lock_task_sighand() can obviously fail, and yes the failure is used
as an indication that this thread has gone. But a zombie thread controlled
by the parent/debugger has not gone yet.

> > 
> > Now lets fix another problem. A mt exec suceeds and apllication does
> > sys_seccomp(SECCOMP_FILTER_FLAG_TSYNC) which fails because it finds
> > another (zombie) SECCOMP_MODE_FILTER thread.
> >
> > And after we fix this problem, what else we will need to fix?
> >
> >
> > I really think that - whatever we do - there should be no other threads
> > after exec, even zombies.
>
> I see where you are coming from.
>
> I need to stare at this a bit longer.  Because you are right.  Reusing
> the signal_struct and leaving zombies around is very prone to bugs.  So
> it is not very maintainable.

Yes, yes, yes. This is what I was arguing with.

> I suspect the answer here is to simply allocate a new sighand_struct and
> a new signal_struct if there we are not single threaded by the time we
> get down to the end of de_thread.

May be. Not sure. Looks very nontrivial.

And I still think that if we do this, we should fix the bug first, then try
to do something like this.

Oleg.



Re: [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec

2017-04-06 Thread Oleg Nesterov
On 04/05, Eric W. Biederman wrote:
>
> Oleg Nesterov  writes:
>
> >> --- a/kernel/signal.c
> >> +++ b/kernel/signal.c
> >> @@ -995,6 +995,10 @@ static int __send_signal(int sig, struct siginfo 
> >> *info, struct task_struct *t,
> >>from_ancestor_ns || (info == SEND_SIG_FORCED)))
> >>goto ret;
> >>
> >> +  /* Don't allow thread group signals after exec */
> >> +  if (group && (t->signal->exec_id != t->self_exec_id))
> >> +  goto ret;
> >
> > Hmm. Either we do not need this exec_id check at all, or we should not
> > take "group" into account; a fatal signal (say SIGKILL) will kill the
> > whole thread-group.
>
> Wow.  Those are crazy semantics for fatal signals.  Sending a tkill
> should not affect the entire thread group.

How so? SIGKILL or any fatal signal should kill the whole process, even if
it was sent by tkill().

> Oleg I think this is a bug
> you introduced and likely requires a separate fix.
>
> I really don't understand the logic in:
>
> commit 5fcd835bf8c2cde06404559b1904e2f1dfcb4567
> Author: Oleg Nesterov 
> Date:   Wed Apr 30 00:52:55 2008 -0700
>
> signals: use __group_complete_signal() for the specific signals too

No. You can even forget about "send" path for the moment. Just suppose that
a thread dequeues SIGKILL sent by tkill(). In this case it will call
do_group_exit() and kill the group anyway. It is not possible to kill an
individual thread, and linux never did this.

Afaics, this commit also fixes the case when SIGKILL can be lost when tkill()
races with the exiting target. Or if the target is a zombie-leader. Exactly
because they obviously can't dequeue SIGKILL.

Plus we want to shutdown the whole thread-group "asap", that is why
complete_signal() sets SIGNAL_GROUP_EXIT and sends SIGKILL to other threads
in the "send" path.

This btw reminds me that we want to do the same with sig_kernel_coredump()
signals too, but this is not simple.

> >> @@ -1247,7 +1251,8 @@ struct sighand_struct *__lock_task_sighand(struct 
> >> task_struct *tsk,
> >> * must see ->sighand == NULL.
> >> */
> >>spin_lock(>siglock);
> >> -  if (likely(sighand == tsk->sighand)) {
> >> +  if (likely((sighand == tsk->sighand) &&
> >> + (tsk->self_exec_id == tsk->signal->exec_id))) {
> >
> > Oh, this doesn't look good to me. Yes, with your approach we probably need
> > this to, say, ensure that posix-cpu-timer can't kill the process after exec,
> > but I'd rather add the exit_state check into run_posix_timers().
>
> The entire point of lock_task_sighand is to not operate on
> tasks/processes that have exited.

Well, the entire point of lock_task_sighand() is take ->siglock if possible.

> The fact it even sighand in there is
> deceptive because it is all about siglock and nothing to do with
> sighand.

Not sure I understand what you mean...

Yes, lock_task_sighand() can obviously fail, and yes the failure is used
as an indication that this thread has gone. But a zombie thread controlled
by the parent/debugger has not gone yet.

> > 
> > Now lets fix another problem. A mt exec suceeds and apllication does
> > sys_seccomp(SECCOMP_FILTER_FLAG_TSYNC) which fails because it finds
> > another (zombie) SECCOMP_MODE_FILTER thread.
> >
> > And after we fix this problem, what else we will need to fix?
> >
> >
> > I really think that - whatever we do - there should be no other threads
> > after exec, even zombies.
>
> I see where you are coming from.
>
> I need to stare at this a bit longer.  Because you are right.  Reusing
> the signal_struct and leaving zombies around is very prone to bugs.  So
> it is not very maintainable.

Yes, yes, yes. This is what I was arguing with.

> I suspect the answer here is to simply allocate a new sighand_struct and
> a new signal_struct if there we are not single threaded by the time we
> get down to the end of de_thread.

May be. Not sure. Looks very nontrivial.

And I still think that if we do this, we should fix the bug first, then try
to do something like this.

Oleg.



Re: [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec

2017-04-05 Thread Eric W. Biederman
Oleg Nesterov  writes:

> On 04/02, Eric W. Biederman wrote:
>>
>> Add exec_id to signal_struct and compare it at a few choice moments.
>
> I really dislike this change no matter what, sorry.
>
> Firstly, task_struct->*_exec_id should simply die (I already have the
> patch), or at least they should be moved into signal_struct simply
> because this is per-process thing.

I am quite happy to find a better way to implement this.  More than
anything this was my proof of concept that it is possible to close
the security holes created if we allow our zombies to be normal zombies.


>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -995,6 +995,10 @@ static int __send_signal(int sig, struct siginfo *info, 
>> struct task_struct *t,
>>  from_ancestor_ns || (info == SEND_SIG_FORCED)))
>>  goto ret;
>>
>> +/* Don't allow thread group signals after exec */
>> +if (group && (t->signal->exec_id != t->self_exec_id))
>> +goto ret;
>
> Hmm. Either we do not need this exec_id check at all, or we should not
> take "group" into account; a fatal signal (say SIGKILL) will kill the
> whole thread-group.

Wow.  Those are crazy semantics for fatal signals.  Sending a tkill
should not affect the entire thread group.  Oleg I think this is a bug
you introduced and likely requires a separate fix.

I really don't understand the logic in:

commit 5fcd835bf8c2cde06404559b1904e2f1dfcb4567
Author: Oleg Nesterov 
Date:   Wed Apr 30 00:52:55 2008 -0700

signals: use __group_complete_signal() for the specific signals too

Based on Pavel Emelyanov's suggestion.

Rename __group_complete_signal() to complete_signal() and use it to process
the specific signals too.  To do this we simply add the "int group" 
argument.

This allows us to greatly simply the signal-sending code and adds a useful
behaviour change.  We can avoid the unneeded wakeups for the private signals
because wants_signal() is more clever than sigismember(blocked), but more
importantly we now take into account the fatal specific signals too.

The latter allows us to kill some subtle checks in handle_stop_signal() and
makes the specific/group signal's behaviour more consistent.  For example,
currently sigtimedwait(FATAL_SIGNAL) behaves differently depending on was 
the
signal sent by kill() or tkill() if the signal was not blocked.

And.  This allows us to tweak/fix the behaviour when the specific signal is
sent to the dying/dead ->group_leader.

Signed-off-by: Pavel Emelyanov 
Signed-off-by: Oleg Nesterov 
Cc: Roland McGrath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

>> @@ -1247,7 +1251,8 @@ struct sighand_struct *__lock_task_sighand(struct 
>> task_struct *tsk,
>>   * must see ->sighand == NULL.
>>   */
>>  spin_lock(>siglock);
>> -if (likely(sighand == tsk->sighand)) {
>> +if (likely((sighand == tsk->sighand) &&
>> +   (tsk->self_exec_id == tsk->signal->exec_id))) {
>
> Oh, this doesn't look good to me. Yes, with your approach we probably need
> this to, say, ensure that posix-cpu-timer can't kill the process after exec,
> but I'd rather add the exit_state check into run_posix_timers().

The entire point of lock_task_sighand is to not operate on
tasks/processes that have exited. The fact it even sighand in there is
deceptive because it is all about siglock and nothing to do with
sighand.

> But OK, suppose that we fix the problems with signal-after-exec.
>
> 
> Now lets fix another problem. A mt exec suceeds and apllication does
> sys_seccomp(SECCOMP_FILTER_FLAG_TSYNC) which fails because it finds
> another (zombie) SECCOMP_MODE_FILTER thread.
>
> And after we fix this problem, what else we will need to fix?
>
>
> I really think that - whatever we do - there should be no other threads
> after exec, even zombies.

I see where you are coming from.

I need to stare at this a bit longer.  Because you are right.  Reusing
the signal_struct and leaving zombies around is very prone to bugs.  So
it is not very maintainable.

I suspect the answer here is to simply allocate a new sighand_struct and
a new signal_struct if there we are not single threaded by the time we
get down to the end of de_thread.

However even if it is a case of whack-a-mole semantically
not-blocking-for-zombies looks like the right thing to do and we need to
figure out how to do it maintainably.

Eric


Re: [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec

2017-04-05 Thread Eric W. Biederman
Oleg Nesterov  writes:

> On 04/02, Eric W. Biederman wrote:
>>
>> Add exec_id to signal_struct and compare it at a few choice moments.
>
> I really dislike this change no matter what, sorry.
>
> Firstly, task_struct->*_exec_id should simply die (I already have the
> patch), or at least they should be moved into signal_struct simply
> because this is per-process thing.

I am quite happy to find a better way to implement this.  More than
anything this was my proof of concept that it is possible to close
the security holes created if we allow our zombies to be normal zombies.


>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -995,6 +995,10 @@ static int __send_signal(int sig, struct siginfo *info, 
>> struct task_struct *t,
>>  from_ancestor_ns || (info == SEND_SIG_FORCED)))
>>  goto ret;
>>
>> +/* Don't allow thread group signals after exec */
>> +if (group && (t->signal->exec_id != t->self_exec_id))
>> +goto ret;
>
> Hmm. Either we do not need this exec_id check at all, or we should not
> take "group" into account; a fatal signal (say SIGKILL) will kill the
> whole thread-group.

Wow.  Those are crazy semantics for fatal signals.  Sending a tkill
should not affect the entire thread group.  Oleg I think this is a bug
you introduced and likely requires a separate fix.

I really don't understand the logic in:

commit 5fcd835bf8c2cde06404559b1904e2f1dfcb4567
Author: Oleg Nesterov 
Date:   Wed Apr 30 00:52:55 2008 -0700

signals: use __group_complete_signal() for the specific signals too

Based on Pavel Emelyanov's suggestion.

Rename __group_complete_signal() to complete_signal() and use it to process
the specific signals too.  To do this we simply add the "int group" 
argument.

This allows us to greatly simply the signal-sending code and adds a useful
behaviour change.  We can avoid the unneeded wakeups for the private signals
because wants_signal() is more clever than sigismember(blocked), but more
importantly we now take into account the fatal specific signals too.

The latter allows us to kill some subtle checks in handle_stop_signal() and
makes the specific/group signal's behaviour more consistent.  For example,
currently sigtimedwait(FATAL_SIGNAL) behaves differently depending on was 
the
signal sent by kill() or tkill() if the signal was not blocked.

And.  This allows us to tweak/fix the behaviour when the specific signal is
sent to the dying/dead ->group_leader.

Signed-off-by: Pavel Emelyanov 
Signed-off-by: Oleg Nesterov 
Cc: Roland McGrath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

>> @@ -1247,7 +1251,8 @@ struct sighand_struct *__lock_task_sighand(struct 
>> task_struct *tsk,
>>   * must see ->sighand == NULL.
>>   */
>>  spin_lock(>siglock);
>> -if (likely(sighand == tsk->sighand)) {
>> +if (likely((sighand == tsk->sighand) &&
>> +   (tsk->self_exec_id == tsk->signal->exec_id))) {
>
> Oh, this doesn't look good to me. Yes, with your approach we probably need
> this to, say, ensure that posix-cpu-timer can't kill the process after exec,
> but I'd rather add the exit_state check into run_posix_timers().

The entire point of lock_task_sighand is to not operate on
tasks/processes that have exited. The fact it even sighand in there is
deceptive because it is all about siglock and nothing to do with
sighand.

> But OK, suppose that we fix the problems with signal-after-exec.
>
> 
> Now lets fix another problem. A mt exec suceeds and apllication does
> sys_seccomp(SECCOMP_FILTER_FLAG_TSYNC) which fails because it finds
> another (zombie) SECCOMP_MODE_FILTER thread.
>
> And after we fix this problem, what else we will need to fix?
>
>
> I really think that - whatever we do - there should be no other threads
> after exec, even zombies.

I see where you are coming from.

I need to stare at this a bit longer.  Because you are right.  Reusing
the signal_struct and leaving zombies around is very prone to bugs.  So
it is not very maintainable.

I suspect the answer here is to simply allocate a new sighand_struct and
a new signal_struct if there we are not single threaded by the time we
get down to the end of de_thread.

However even if it is a case of whack-a-mole semantically
not-blocking-for-zombies looks like the right thing to do and we need to
figure out how to do it maintainably.

Eric


Re: [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec

2017-04-05 Thread Oleg Nesterov
On 04/02, Eric W. Biederman wrote:
>
> Add exec_id to signal_struct and compare it at a few choice moments.

I really dislike this change no matter what, sorry.

Firstly, task_struct->*_exec_id should simply die (I already have the
patch), or at least they should be moved into signal_struct simply
because this is per-process thing.

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -995,6 +995,10 @@ static int __send_signal(int sig, struct siginfo *info, 
> struct task_struct *t,
>   from_ancestor_ns || (info == SEND_SIG_FORCED)))
>   goto ret;
>
> + /* Don't allow thread group signals after exec */
> + if (group && (t->signal->exec_id != t->self_exec_id))
> + goto ret;

Hmm. Either we do not need this exec_id check at all, or we should not
take "group" into account; a fatal signal (say SIGKILL) will kill the
whole thread-group.

> @@ -1247,7 +1251,8 @@ struct sighand_struct *__lock_task_sighand(struct 
> task_struct *tsk,
>* must see ->sighand == NULL.
>*/
>   spin_lock(>siglock);
> - if (likely(sighand == tsk->sighand)) {
> + if (likely((sighand == tsk->sighand) &&
> +(tsk->self_exec_id == tsk->signal->exec_id))) {

Oh, this doesn't look good to me. Yes, with your approach we probably need
this to, say, ensure that posix-cpu-timer can't kill the process after exec,
but I'd rather add the exit_state check into run_posix_timers().

But OK, suppose that we fix the problems with signal-after-exec.


Now lets fix another problem. A mt exec suceeds and apllication does
sys_seccomp(SECCOMP_FILTER_FLAG_TSYNC) which fails because it finds
another (zombie) SECCOMP_MODE_FILTER thread.

And after we fix this problem, what else we will need to fix?


I really think that - whatever we do - there should be no other threads
after exec, even zombies.

Oleg.



Re: [RFC][PATCH v2 5/5] signal: Don't allow accessing signal_struct by old threads after exec

2017-04-05 Thread Oleg Nesterov
On 04/02, Eric W. Biederman wrote:
>
> Add exec_id to signal_struct and compare it at a few choice moments.

I really dislike this change no matter what, sorry.

Firstly, task_struct->*_exec_id should simply die (I already have the
patch), or at least they should be moved into signal_struct simply
because this is per-process thing.

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -995,6 +995,10 @@ static int __send_signal(int sig, struct siginfo *info, 
> struct task_struct *t,
>   from_ancestor_ns || (info == SEND_SIG_FORCED)))
>   goto ret;
>
> + /* Don't allow thread group signals after exec */
> + if (group && (t->signal->exec_id != t->self_exec_id))
> + goto ret;

Hmm. Either we do not need this exec_id check at all, or we should not
take "group" into account; a fatal signal (say SIGKILL) will kill the
whole thread-group.

> @@ -1247,7 +1251,8 @@ struct sighand_struct *__lock_task_sighand(struct 
> task_struct *tsk,
>* must see ->sighand == NULL.
>*/
>   spin_lock(>siglock);
> - if (likely(sighand == tsk->sighand)) {
> + if (likely((sighand == tsk->sighand) &&
> +(tsk->self_exec_id == tsk->signal->exec_id))) {

Oh, this doesn't look good to me. Yes, with your approach we probably need
this to, say, ensure that posix-cpu-timer can't kill the process after exec,
but I'd rather add the exit_state check into run_posix_timers().

But OK, suppose that we fix the problems with signal-after-exec.


Now lets fix another problem. A mt exec suceeds and apllication does
sys_seccomp(SECCOMP_FILTER_FLAG_TSYNC) which fails because it finds
another (zombie) SECCOMP_MODE_FILTER thread.

And after we fix this problem, what else we will need to fix?


I really think that - whatever we do - there should be no other threads
after exec, even zombies.

Oleg.