Re: [PATCH v3] kernel/signal: Signal-based pre-coredump notification

2018-10-24 Thread Enke Chen
Hi, Oleg:

On 10/24/18 7:02 AM, Oleg Nesterov wrote:
> On 10/23, Enke Chen wrote:
>>
>> --- a/fs/coredump.c
>> +++ b/fs/coredump.c
>> @@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo)
>>  if (retval < 0)
>>  goto fail_creds;
>>
>> +/*
>> + * Send the pre-coredump signal to the parent if requested.
>> + */
>> +do_notify_parent_predump();
>> +cond_resched();
> 
> I am still not sure cond_resched() makes any sense here...
> 
>> @@ -1553,6 +1553,9 @@ static int copy_signal(unsigned long clone_flags, 
>> struct task_struct *tsk)
>>  tty_audit_fork(sig);
>>  sched_autogroup_fork(sig);
>>
>> +/* Clear the pre-coredump signal for the child */
>> +sig->predump_signal = 0;
> 
> No need, copy_signal() does zalloc().
> 

Removed.

> 
>> +void do_notify_parent_predump(void)
>> +{
>> +struct sighand_struct *sighand;
>> +struct kernel_siginfo info;
>> +struct task_struct *parent;
>> +unsigned long flags;
>> +int sig;
>> +
>> +read_lock(_lock);
>> +parent = current->parent;
>> +sig = parent->signal->predump_signal;
>> +if (sig != 0) {
>> +clear_siginfo();
>> +info.si_pid = task_tgid_vnr(current);
>> +info.si_signo = sig;
>> +if (sig == SIGCHLD)
>> +info.si_code = CLD_PREDUMP;
>> +
>> +sighand = parent->sighand;
>> +spin_lock_irqsave(>siglock, flags);
>> +__group_send_sig_info(sig, , parent);
>> +spin_unlock_irqrestore(>siglock, flags);
> 
> You can just use do_send_sig_info() and remove 
> sighand/flags/spin_lock_irqsave.

Ok.

> 
> Perhaps the "likely" predump_signal==0 check at the start makes sense to avoid
> read_lock(tasklist).

I am not sure if we should/need to deviate from the convention (locking before
access the parent). In any case it may not matter as the coredump is in the
exceptional code flow.

> 
> And I'd suggest to move it into coredump.c and make it static. It won't have
> another user.

Ok.

Thanks.  -- Enke



Re: [PATCH v3] kernel/signal: Signal-based pre-coredump notification

2018-10-24 Thread Enke Chen
Hi, Oleg:

On 10/24/18 7:02 AM, Oleg Nesterov wrote:
> On 10/23, Enke Chen wrote:
>>
>> --- a/fs/coredump.c
>> +++ b/fs/coredump.c
>> @@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo)
>>  if (retval < 0)
>>  goto fail_creds;
>>
>> +/*
>> + * Send the pre-coredump signal to the parent if requested.
>> + */
>> +do_notify_parent_predump();
>> +cond_resched();
> 
> I am still not sure cond_resched() makes any sense here...
> 
>> @@ -1553,6 +1553,9 @@ static int copy_signal(unsigned long clone_flags, 
>> struct task_struct *tsk)
>>  tty_audit_fork(sig);
>>  sched_autogroup_fork(sig);
>>
>> +/* Clear the pre-coredump signal for the child */
>> +sig->predump_signal = 0;
> 
> No need, copy_signal() does zalloc().
> 

Removed.

> 
>> +void do_notify_parent_predump(void)
>> +{
>> +struct sighand_struct *sighand;
>> +struct kernel_siginfo info;
>> +struct task_struct *parent;
>> +unsigned long flags;
>> +int sig;
>> +
>> +read_lock(_lock);
>> +parent = current->parent;
>> +sig = parent->signal->predump_signal;
>> +if (sig != 0) {
>> +clear_siginfo();
>> +info.si_pid = task_tgid_vnr(current);
>> +info.si_signo = sig;
>> +if (sig == SIGCHLD)
>> +info.si_code = CLD_PREDUMP;
>> +
>> +sighand = parent->sighand;
>> +spin_lock_irqsave(>siglock, flags);
>> +__group_send_sig_info(sig, , parent);
>> +spin_unlock_irqrestore(>siglock, flags);
> 
> You can just use do_send_sig_info() and remove 
> sighand/flags/spin_lock_irqsave.

Ok.

> 
> Perhaps the "likely" predump_signal==0 check at the start makes sense to avoid
> read_lock(tasklist).

I am not sure if we should/need to deviate from the convention (locking before
access the parent). In any case it may not matter as the coredump is in the
exceptional code flow.

> 
> And I'd suggest to move it into coredump.c and make it static. It won't have
> another user.

Ok.

Thanks.  -- Enke



Re: [PATCH v3] kernel/signal: Signal-based pre-coredump notification

2018-10-24 Thread Oleg Nesterov
On 10/23, Enke Chen wrote:
>
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo)
>   if (retval < 0)
>   goto fail_creds;
>
> + /*
> +  * Send the pre-coredump signal to the parent if requested.
> +  */
> + do_notify_parent_predump();
> + cond_resched();

I am still not sure cond_resched() makes any sense here...

> @@ -1553,6 +1553,9 @@ static int copy_signal(unsigned long clone_flags, 
> struct task_struct *tsk)
>   tty_audit_fork(sig);
>   sched_autogroup_fork(sig);
>
> + /* Clear the pre-coredump signal for the child */
> + sig->predump_signal = 0;

No need, copy_signal() does zalloc().


> +void do_notify_parent_predump(void)
> +{
> + struct sighand_struct *sighand;
> + struct kernel_siginfo info;
> + struct task_struct *parent;
> + unsigned long flags;
> + int sig;
> +
> + read_lock(_lock);
> + parent = current->parent;
> + sig = parent->signal->predump_signal;
> + if (sig != 0) {
> + clear_siginfo();
> + info.si_pid = task_tgid_vnr(current);
> + info.si_signo = sig;
> + if (sig == SIGCHLD)
> + info.si_code = CLD_PREDUMP;
> +
> + sighand = parent->sighand;
> + spin_lock_irqsave(>siglock, flags);
> + __group_send_sig_info(sig, , parent);
> + spin_unlock_irqrestore(>siglock, flags);

You can just use do_send_sig_info() and remove sighand/flags/spin_lock_irqsave.

Perhaps the "likely" predump_signal==0 check at the start makes sense to avoid
read_lock(tasklist).

And I'd suggest to move it into coredump.c and make it static. It won't have
another user.

Oleg.



Re: [PATCH v3] kernel/signal: Signal-based pre-coredump notification

2018-10-24 Thread Oleg Nesterov
On 10/23, Enke Chen wrote:
>
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo)
>   if (retval < 0)
>   goto fail_creds;
>
> + /*
> +  * Send the pre-coredump signal to the parent if requested.
> +  */
> + do_notify_parent_predump();
> + cond_resched();

I am still not sure cond_resched() makes any sense here...

> @@ -1553,6 +1553,9 @@ static int copy_signal(unsigned long clone_flags, 
> struct task_struct *tsk)
>   tty_audit_fork(sig);
>   sched_autogroup_fork(sig);
>
> + /* Clear the pre-coredump signal for the child */
> + sig->predump_signal = 0;

No need, copy_signal() does zalloc().


> +void do_notify_parent_predump(void)
> +{
> + struct sighand_struct *sighand;
> + struct kernel_siginfo info;
> + struct task_struct *parent;
> + unsigned long flags;
> + int sig;
> +
> + read_lock(_lock);
> + parent = current->parent;
> + sig = parent->signal->predump_signal;
> + if (sig != 0) {
> + clear_siginfo();
> + info.si_pid = task_tgid_vnr(current);
> + info.si_signo = sig;
> + if (sig == SIGCHLD)
> + info.si_code = CLD_PREDUMP;
> +
> + sighand = parent->sighand;
> + spin_lock_irqsave(>siglock, flags);
> + __group_send_sig_info(sig, , parent);
> + spin_unlock_irqrestore(>siglock, flags);

You can just use do_send_sig_info() and remove sighand/flags/spin_lock_irqsave.

Perhaps the "likely" predump_signal==0 check at the start makes sense to avoid
read_lock(tasklist).

And I'd suggest to move it into coredump.c and make it static. It won't have
another user.

Oleg.



[PATCH v3] kernel/signal: Signal-based pre-coredump notification

2018-10-23 Thread Enke Chen
For simplicity and consistency, this patch provides an implementation
for signal-based fault notification prior to the coredump of a child
process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can
be used by an application to express its interest and to specify the
signal for such a notification. A new signal code CLD_PREDUMP is also
defined for SIGCHLD.

Changes to prctl(2):

   PR_SET_PREDUMP_SIG (since Linux 4.20.x)
  Set the child pre-coredump signal of the calling process to
  arg2 (either a signal value in the range 1..maxsig, or 0 to
  clear). This is the signal that the calling process will get
  prior to the coredump of a child process. This value is
  cleared across execve(2), or for the child of a fork(2).

  When SIGCHLD is specified, the signal code will be set to
  CLD_PREDUMP in such an SIGCHLD signal.

   PR_GET_PREDUMP_SIG (since Linux 4.20.x)
  Return the current value of the child pre-coredump signal,
  in the location pointed to by (int *) arg2.

Background:

As the coredump of a process may take time, in certain time-sensitive
applications it is necessary for a parent process (e.g., a process
manager) to be notified of a child's imminent death before the coredump
so that the parent process can act sooner, such as re-spawning an
application process, or initiating a control-plane fail-over.

Currently there are two ways for a parent process to be notified of a
child process's state change. One is to use the POSIX signal, and
another is to use the kernel connector module. The specific events and
actions are summarized as follows:

Process EventPOSIX SignalConnector-based
--
ptrace_attach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_STOPPED

ptrace_detach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_CONTINUED

pre_coredump/N/A proc_coredump_connector()
get_signal()

post_coredump/   do_notify_parent()  proc_exit_connector()
do_exit()SIGCHLD / exit_signal
--

As shown in the table, the signal-based pre-coredump notification is not
currently available. In some cases using a connector-based notification
can be quite complicated (e.g., when a process manager is written in shell
scripts and thus is subject to certain inherent limitations), and a
signal-based notification would be simpler and better suited.

Signed-off-by: Enke Chen 
---
v2 -> v3:

Addressed review comments from Oleg Nesterov, including:
o remove the restriction on signal for PR_SET_PREDUMP_SIG.
o code simplification

 arch/x86/kernel/signal_compat.c  |   2 +-
 fs/coredump.c|   6 +
 fs/exec.c|   3 +
 include/linux/sched/signal.h |   4 +
 include/uapi/asm-generic/siginfo.h   |   3 +-
 include/uapi/linux/prctl.h   |   4 +
 kernel/fork.c|   3 +
 kernel/signal.c  |  31 +
 kernel/sys.c |  13 ++
 tools/testing/selftests/prctl/Makefile   |   2 +-
 tools/testing/selftests/prctl/predump-sig-test.c | 169 +++
 11 files changed, 237 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/prctl/predump-sig-test.c

diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index 9ccbf05..a3deba8 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -30,7 +30,7 @@ static inline void signal_compat_build_tests(void)
BUILD_BUG_ON(NSIGSEGV != 7);
BUILD_BUG_ON(NSIGBUS  != 5);
BUILD_BUG_ON(NSIGTRAP != 5);
-   BUILD_BUG_ON(NSIGCHLD != 6);
+   BUILD_BUG_ON(NSIGCHLD != 7);
BUILD_BUG_ON(NSIGSYS  != 1);
 
/* This is part of the ABI and can never change in size: */
diff --git a/fs/coredump.c b/fs/coredump.c
index e42e17e..d6ca1a3 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo)
if (retval < 0)
goto fail_creds;
 
+   /*
+* Send the pre-coredump signal to the parent if requested.
+*/
+   do_notify_parent_predump();
+   cond_resched();
+
old_cred = override_creds(cred);
 
ispipe = format_corename(, );
diff --git a/fs/exec.c b/fs/exec.c
index fc281b7..7714da7 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk)
/* we have changed execution domain */
tsk->exit_signal = SIGCHLD;
 
+   /* Clear the pre-coredump signal before loading a new binary */
+   sig->predump_signal = 0;
+
 #ifdef 

[PATCH v3] kernel/signal: Signal-based pre-coredump notification

2018-10-23 Thread Enke Chen
For simplicity and consistency, this patch provides an implementation
for signal-based fault notification prior to the coredump of a child
process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can
be used by an application to express its interest and to specify the
signal for such a notification. A new signal code CLD_PREDUMP is also
defined for SIGCHLD.

Changes to prctl(2):

   PR_SET_PREDUMP_SIG (since Linux 4.20.x)
  Set the child pre-coredump signal of the calling process to
  arg2 (either a signal value in the range 1..maxsig, or 0 to
  clear). This is the signal that the calling process will get
  prior to the coredump of a child process. This value is
  cleared across execve(2), or for the child of a fork(2).

  When SIGCHLD is specified, the signal code will be set to
  CLD_PREDUMP in such an SIGCHLD signal.

   PR_GET_PREDUMP_SIG (since Linux 4.20.x)
  Return the current value of the child pre-coredump signal,
  in the location pointed to by (int *) arg2.

Background:

As the coredump of a process may take time, in certain time-sensitive
applications it is necessary for a parent process (e.g., a process
manager) to be notified of a child's imminent death before the coredump
so that the parent process can act sooner, such as re-spawning an
application process, or initiating a control-plane fail-over.

Currently there are two ways for a parent process to be notified of a
child process's state change. One is to use the POSIX signal, and
another is to use the kernel connector module. The specific events and
actions are summarized as follows:

Process EventPOSIX SignalConnector-based
--
ptrace_attach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_STOPPED

ptrace_detach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_CONTINUED

pre_coredump/N/A proc_coredump_connector()
get_signal()

post_coredump/   do_notify_parent()  proc_exit_connector()
do_exit()SIGCHLD / exit_signal
--

As shown in the table, the signal-based pre-coredump notification is not
currently available. In some cases using a connector-based notification
can be quite complicated (e.g., when a process manager is written in shell
scripts and thus is subject to certain inherent limitations), and a
signal-based notification would be simpler and better suited.

Signed-off-by: Enke Chen 
---
v2 -> v3:

Addressed review comments from Oleg Nesterov, including:
o remove the restriction on signal for PR_SET_PREDUMP_SIG.
o code simplification

 arch/x86/kernel/signal_compat.c  |   2 +-
 fs/coredump.c|   6 +
 fs/exec.c|   3 +
 include/linux/sched/signal.h |   4 +
 include/uapi/asm-generic/siginfo.h   |   3 +-
 include/uapi/linux/prctl.h   |   4 +
 kernel/fork.c|   3 +
 kernel/signal.c  |  31 +
 kernel/sys.c |  13 ++
 tools/testing/selftests/prctl/Makefile   |   2 +-
 tools/testing/selftests/prctl/predump-sig-test.c | 169 +++
 11 files changed, 237 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/prctl/predump-sig-test.c

diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index 9ccbf05..a3deba8 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -30,7 +30,7 @@ static inline void signal_compat_build_tests(void)
BUILD_BUG_ON(NSIGSEGV != 7);
BUILD_BUG_ON(NSIGBUS  != 5);
BUILD_BUG_ON(NSIGTRAP != 5);
-   BUILD_BUG_ON(NSIGCHLD != 6);
+   BUILD_BUG_ON(NSIGCHLD != 7);
BUILD_BUG_ON(NSIGSYS  != 1);
 
/* This is part of the ABI and can never change in size: */
diff --git a/fs/coredump.c b/fs/coredump.c
index e42e17e..d6ca1a3 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo)
if (retval < 0)
goto fail_creds;
 
+   /*
+* Send the pre-coredump signal to the parent if requested.
+*/
+   do_notify_parent_predump();
+   cond_resched();
+
old_cred = override_creds(cred);
 
ispipe = format_corename(, );
diff --git a/fs/exec.c b/fs/exec.c
index fc281b7..7714da7 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk)
/* we have changed execution domain */
tsk->exit_signal = SIGCHLD;
 
+   /* Clear the pre-coredump signal before loading a new binary */
+   sig->predump_signal = 0;
+
 #ifdef