Re: [PATCH v3] kernel/signal: Signal-based pre-coredump notification
Hi, Oleg: On 10/24/18 7:02 AM, Oleg Nesterov wrote: > On 10/23, Enke Chen wrote: >> >> --- a/fs/coredump.c >> +++ b/fs/coredump.c >> @@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo) >> if (retval < 0) >> goto fail_creds; >> >> +/* >> + * Send the pre-coredump signal to the parent if requested. >> + */ >> +do_notify_parent_predump(); >> +cond_resched(); > > I am still not sure cond_resched() makes any sense here... > >> @@ -1553,6 +1553,9 @@ static int copy_signal(unsigned long clone_flags, >> struct task_struct *tsk) >> tty_audit_fork(sig); >> sched_autogroup_fork(sig); >> >> +/* Clear the pre-coredump signal for the child */ >> +sig->predump_signal = 0; > > No need, copy_signal() does zalloc(). > Removed. > >> +void do_notify_parent_predump(void) >> +{ >> +struct sighand_struct *sighand; >> +struct kernel_siginfo info; >> +struct task_struct *parent; >> +unsigned long flags; >> +int sig; >> + >> +read_lock(_lock); >> +parent = current->parent; >> +sig = parent->signal->predump_signal; >> +if (sig != 0) { >> +clear_siginfo(); >> +info.si_pid = task_tgid_vnr(current); >> +info.si_signo = sig; >> +if (sig == SIGCHLD) >> +info.si_code = CLD_PREDUMP; >> + >> +sighand = parent->sighand; >> +spin_lock_irqsave(>siglock, flags); >> +__group_send_sig_info(sig, , parent); >> +spin_unlock_irqrestore(>siglock, flags); > > You can just use do_send_sig_info() and remove > sighand/flags/spin_lock_irqsave. Ok. > > Perhaps the "likely" predump_signal==0 check at the start makes sense to avoid > read_lock(tasklist). I am not sure if we should/need to deviate from the convention (locking before access the parent). In any case it may not matter as the coredump is in the exceptional code flow. > > And I'd suggest to move it into coredump.c and make it static. It won't have > another user. Ok. Thanks. -- Enke
Re: [PATCH v3] kernel/signal: Signal-based pre-coredump notification
Hi, Oleg: On 10/24/18 7:02 AM, Oleg Nesterov wrote: > On 10/23, Enke Chen wrote: >> >> --- a/fs/coredump.c >> +++ b/fs/coredump.c >> @@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo) >> if (retval < 0) >> goto fail_creds; >> >> +/* >> + * Send the pre-coredump signal to the parent if requested. >> + */ >> +do_notify_parent_predump(); >> +cond_resched(); > > I am still not sure cond_resched() makes any sense here... > >> @@ -1553,6 +1553,9 @@ static int copy_signal(unsigned long clone_flags, >> struct task_struct *tsk) >> tty_audit_fork(sig); >> sched_autogroup_fork(sig); >> >> +/* Clear the pre-coredump signal for the child */ >> +sig->predump_signal = 0; > > No need, copy_signal() does zalloc(). > Removed. > >> +void do_notify_parent_predump(void) >> +{ >> +struct sighand_struct *sighand; >> +struct kernel_siginfo info; >> +struct task_struct *parent; >> +unsigned long flags; >> +int sig; >> + >> +read_lock(_lock); >> +parent = current->parent; >> +sig = parent->signal->predump_signal; >> +if (sig != 0) { >> +clear_siginfo(); >> +info.si_pid = task_tgid_vnr(current); >> +info.si_signo = sig; >> +if (sig == SIGCHLD) >> +info.si_code = CLD_PREDUMP; >> + >> +sighand = parent->sighand; >> +spin_lock_irqsave(>siglock, flags); >> +__group_send_sig_info(sig, , parent); >> +spin_unlock_irqrestore(>siglock, flags); > > You can just use do_send_sig_info() and remove > sighand/flags/spin_lock_irqsave. Ok. > > Perhaps the "likely" predump_signal==0 check at the start makes sense to avoid > read_lock(tasklist). I am not sure if we should/need to deviate from the convention (locking before access the parent). In any case it may not matter as the coredump is in the exceptional code flow. > > And I'd suggest to move it into coredump.c and make it static. It won't have > another user. Ok. Thanks. -- Enke
Re: [PATCH v3] kernel/signal: Signal-based pre-coredump notification
On 10/23, Enke Chen wrote: > > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo) > if (retval < 0) > goto fail_creds; > > + /* > + * Send the pre-coredump signal to the parent if requested. > + */ > + do_notify_parent_predump(); > + cond_resched(); I am still not sure cond_resched() makes any sense here... > @@ -1553,6 +1553,9 @@ static int copy_signal(unsigned long clone_flags, > struct task_struct *tsk) > tty_audit_fork(sig); > sched_autogroup_fork(sig); > > + /* Clear the pre-coredump signal for the child */ > + sig->predump_signal = 0; No need, copy_signal() does zalloc(). > +void do_notify_parent_predump(void) > +{ > + struct sighand_struct *sighand; > + struct kernel_siginfo info; > + struct task_struct *parent; > + unsigned long flags; > + int sig; > + > + read_lock(_lock); > + parent = current->parent; > + sig = parent->signal->predump_signal; > + if (sig != 0) { > + clear_siginfo(); > + info.si_pid = task_tgid_vnr(current); > + info.si_signo = sig; > + if (sig == SIGCHLD) > + info.si_code = CLD_PREDUMP; > + > + sighand = parent->sighand; > + spin_lock_irqsave(>siglock, flags); > + __group_send_sig_info(sig, , parent); > + spin_unlock_irqrestore(>siglock, flags); You can just use do_send_sig_info() and remove sighand/flags/spin_lock_irqsave. Perhaps the "likely" predump_signal==0 check at the start makes sense to avoid read_lock(tasklist). And I'd suggest to move it into coredump.c and make it static. It won't have another user. Oleg.
Re: [PATCH v3] kernel/signal: Signal-based pre-coredump notification
On 10/23, Enke Chen wrote: > > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo) > if (retval < 0) > goto fail_creds; > > + /* > + * Send the pre-coredump signal to the parent if requested. > + */ > + do_notify_parent_predump(); > + cond_resched(); I am still not sure cond_resched() makes any sense here... > @@ -1553,6 +1553,9 @@ static int copy_signal(unsigned long clone_flags, > struct task_struct *tsk) > tty_audit_fork(sig); > sched_autogroup_fork(sig); > > + /* Clear the pre-coredump signal for the child */ > + sig->predump_signal = 0; No need, copy_signal() does zalloc(). > +void do_notify_parent_predump(void) > +{ > + struct sighand_struct *sighand; > + struct kernel_siginfo info; > + struct task_struct *parent; > + unsigned long flags; > + int sig; > + > + read_lock(_lock); > + parent = current->parent; > + sig = parent->signal->predump_signal; > + if (sig != 0) { > + clear_siginfo(); > + info.si_pid = task_tgid_vnr(current); > + info.si_signo = sig; > + if (sig == SIGCHLD) > + info.si_code = CLD_PREDUMP; > + > + sighand = parent->sighand; > + spin_lock_irqsave(>siglock, flags); > + __group_send_sig_info(sig, , parent); > + spin_unlock_irqrestore(>siglock, flags); You can just use do_send_sig_info() and remove sighand/flags/spin_lock_irqsave. Perhaps the "likely" predump_signal==0 check at the start makes sense to avoid read_lock(tasklist). And I'd suggest to move it into coredump.c and make it static. It won't have another user. Oleg.
[PATCH v3] kernel/signal: Signal-based pre-coredump notification
For simplicity and consistency, this patch provides an implementation for signal-based fault notification prior to the coredump of a child process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can be used by an application to express its interest and to specify the signal for such a notification. A new signal code CLD_PREDUMP is also defined for SIGCHLD. Changes to prctl(2): PR_SET_PREDUMP_SIG (since Linux 4.20.x) Set the child pre-coredump signal of the calling process to arg2 (either a signal value in the range 1..maxsig, or 0 to clear). This is the signal that the calling process will get prior to the coredump of a child process. This value is cleared across execve(2), or for the child of a fork(2). When SIGCHLD is specified, the signal code will be set to CLD_PREDUMP in such an SIGCHLD signal. PR_GET_PREDUMP_SIG (since Linux 4.20.x) Return the current value of the child pre-coredump signal, in the location pointed to by (int *) arg2. Background: As the coredump of a process may take time, in certain time-sensitive applications it is necessary for a parent process (e.g., a process manager) to be notified of a child's imminent death before the coredump so that the parent process can act sooner, such as re-spawning an application process, or initiating a control-plane fail-over. Currently there are two ways for a parent process to be notified of a child process's state change. One is to use the POSIX signal, and another is to use the kernel connector module. The specific events and actions are summarized as follows: Process EventPOSIX SignalConnector-based -- ptrace_attach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_STOPPED ptrace_detach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_CONTINUED pre_coredump/N/A proc_coredump_connector() get_signal() post_coredump/ do_notify_parent() proc_exit_connector() do_exit()SIGCHLD / exit_signal -- As shown in the table, the signal-based pre-coredump notification is not currently available. In some cases using a connector-based notification can be quite complicated (e.g., when a process manager is written in shell scripts and thus is subject to certain inherent limitations), and a signal-based notification would be simpler and better suited. Signed-off-by: Enke Chen --- v2 -> v3: Addressed review comments from Oleg Nesterov, including: o remove the restriction on signal for PR_SET_PREDUMP_SIG. o code simplification arch/x86/kernel/signal_compat.c | 2 +- fs/coredump.c| 6 + fs/exec.c| 3 + include/linux/sched/signal.h | 4 + include/uapi/asm-generic/siginfo.h | 3 +- include/uapi/linux/prctl.h | 4 + kernel/fork.c| 3 + kernel/signal.c | 31 + kernel/sys.c | 13 ++ tools/testing/selftests/prctl/Makefile | 2 +- tools/testing/selftests/prctl/predump-sig-test.c | 169 +++ 11 files changed, 237 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/prctl/predump-sig-test.c diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index 9ccbf05..a3deba8 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -30,7 +30,7 @@ static inline void signal_compat_build_tests(void) BUILD_BUG_ON(NSIGSEGV != 7); BUILD_BUG_ON(NSIGBUS != 5); BUILD_BUG_ON(NSIGTRAP != 5); - BUILD_BUG_ON(NSIGCHLD != 6); + BUILD_BUG_ON(NSIGCHLD != 7); BUILD_BUG_ON(NSIGSYS != 1); /* This is part of the ABI and can never change in size: */ diff --git a/fs/coredump.c b/fs/coredump.c index e42e17e..d6ca1a3 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo) if (retval < 0) goto fail_creds; + /* +* Send the pre-coredump signal to the parent if requested. +*/ + do_notify_parent_predump(); + cond_resched(); + old_cred = override_creds(cred); ispipe = format_corename(, ); diff --git a/fs/exec.c b/fs/exec.c index fc281b7..7714da7 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk) /* we have changed execution domain */ tsk->exit_signal = SIGCHLD; + /* Clear the pre-coredump signal before loading a new binary */ + sig->predump_signal = 0; + #ifdef
[PATCH v3] kernel/signal: Signal-based pre-coredump notification
For simplicity and consistency, this patch provides an implementation for signal-based fault notification prior to the coredump of a child process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can be used by an application to express its interest and to specify the signal for such a notification. A new signal code CLD_PREDUMP is also defined for SIGCHLD. Changes to prctl(2): PR_SET_PREDUMP_SIG (since Linux 4.20.x) Set the child pre-coredump signal of the calling process to arg2 (either a signal value in the range 1..maxsig, or 0 to clear). This is the signal that the calling process will get prior to the coredump of a child process. This value is cleared across execve(2), or for the child of a fork(2). When SIGCHLD is specified, the signal code will be set to CLD_PREDUMP in such an SIGCHLD signal. PR_GET_PREDUMP_SIG (since Linux 4.20.x) Return the current value of the child pre-coredump signal, in the location pointed to by (int *) arg2. Background: As the coredump of a process may take time, in certain time-sensitive applications it is necessary for a parent process (e.g., a process manager) to be notified of a child's imminent death before the coredump so that the parent process can act sooner, such as re-spawning an application process, or initiating a control-plane fail-over. Currently there are two ways for a parent process to be notified of a child process's state change. One is to use the POSIX signal, and another is to use the kernel connector module. The specific events and actions are summarized as follows: Process EventPOSIX SignalConnector-based -- ptrace_attach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_STOPPED ptrace_detach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_CONTINUED pre_coredump/N/A proc_coredump_connector() get_signal() post_coredump/ do_notify_parent() proc_exit_connector() do_exit()SIGCHLD / exit_signal -- As shown in the table, the signal-based pre-coredump notification is not currently available. In some cases using a connector-based notification can be quite complicated (e.g., when a process manager is written in shell scripts and thus is subject to certain inherent limitations), and a signal-based notification would be simpler and better suited. Signed-off-by: Enke Chen --- v2 -> v3: Addressed review comments from Oleg Nesterov, including: o remove the restriction on signal for PR_SET_PREDUMP_SIG. o code simplification arch/x86/kernel/signal_compat.c | 2 +- fs/coredump.c| 6 + fs/exec.c| 3 + include/linux/sched/signal.h | 4 + include/uapi/asm-generic/siginfo.h | 3 +- include/uapi/linux/prctl.h | 4 + kernel/fork.c| 3 + kernel/signal.c | 31 + kernel/sys.c | 13 ++ tools/testing/selftests/prctl/Makefile | 2 +- tools/testing/selftests/prctl/predump-sig-test.c | 169 +++ 11 files changed, 237 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/prctl/predump-sig-test.c diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index 9ccbf05..a3deba8 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -30,7 +30,7 @@ static inline void signal_compat_build_tests(void) BUILD_BUG_ON(NSIGSEGV != 7); BUILD_BUG_ON(NSIGBUS != 5); BUILD_BUG_ON(NSIGTRAP != 5); - BUILD_BUG_ON(NSIGCHLD != 6); + BUILD_BUG_ON(NSIGCHLD != 7); BUILD_BUG_ON(NSIGSYS != 1); /* This is part of the ABI and can never change in size: */ diff --git a/fs/coredump.c b/fs/coredump.c index e42e17e..d6ca1a3 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -590,6 +590,12 @@ void do_coredump(const kernel_siginfo_t *siginfo) if (retval < 0) goto fail_creds; + /* +* Send the pre-coredump signal to the parent if requested. +*/ + do_notify_parent_predump(); + cond_resched(); + old_cred = override_creds(cred); ispipe = format_corename(, ); diff --git a/fs/exec.c b/fs/exec.c index fc281b7..7714da7 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk) /* we have changed execution domain */ tsk->exit_signal = SIGCHLD; + /* Clear the pre-coredump signal before loading a new binary */ + sig->predump_signal = 0; + #ifdef