Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification
Hi, Oleg: Yes, it should be the "real_parent" that is more interested in the notification. Will revert back. +static void do_notify_parent_predump(void) +{ + struct task_struct *parent; + int sig; + + rcu_read_lock(); + parent = rcu_dereference(current->real_parent); + sig = parent->signal->predump_signal; + if (sig != 0) + do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); + rcu_read_unlock(); +} Thanks. -- Enke On 10/29/18 4:18 AM, Oleg Nesterov wrote: > Hi, > > On 10/26, Enke Chen wrote: >> >> This is really a good idea given that "parent" is declared as RCU-protected. >> Just a bit odd, though, that the "parent" has not been accessed this way in >> the code base. > > It is acccessed when possible, > >> So just to confirm: the revised code would look like the following: >> >> static void do_notify_parent_predump(void) >> { >> struct task_struct *parent; >> int sig; >> >> rcu_read_lock(); >> parent = rcu_dereference(current->parent); >> sig = parent->signal->predump_signal; >> if (sig != 0) >> do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); >> rcu_read_unlock(); >> } > > Yes, this is what I meant. > > But I still think do_notify_parent_predump() should notify ->real_parent, > not ->parent. > > Oleg. >
Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification
Hi, Oleg: Yes, it should be the "real_parent" that is more interested in the notification. Will revert back. +static void do_notify_parent_predump(void) +{ + struct task_struct *parent; + int sig; + + rcu_read_lock(); + parent = rcu_dereference(current->real_parent); + sig = parent->signal->predump_signal; + if (sig != 0) + do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); + rcu_read_unlock(); +} Thanks. -- Enke On 10/29/18 4:18 AM, Oleg Nesterov wrote: > Hi, > > On 10/26, Enke Chen wrote: >> >> This is really a good idea given that "parent" is declared as RCU-protected. >> Just a bit odd, though, that the "parent" has not been accessed this way in >> the code base. > > It is acccessed when possible, > >> So just to confirm: the revised code would look like the following: >> >> static void do_notify_parent_predump(void) >> { >> struct task_struct *parent; >> int sig; >> >> rcu_read_lock(); >> parent = rcu_dereference(current->parent); >> sig = parent->signal->predump_signal; >> if (sig != 0) >> do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); >> rcu_read_unlock(); >> } > > Yes, this is what I meant. > > But I still think do_notify_parent_predump() should notify ->real_parent, > not ->parent. > > Oleg. >
Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification
Hi, On 10/26, Enke Chen wrote: > > This is really a good idea given that "parent" is declared as RCU-protected. > Just a bit odd, though, that the "parent" has not been accessed this way in > the code base. It is acccessed when possible, > So just to confirm: the revised code would look like the following: > > static void do_notify_parent_predump(void) > { > struct task_struct *parent; > int sig; > > rcu_read_lock(); > parent = rcu_dereference(current->parent); > sig = parent->signal->predump_signal; > if (sig != 0) > do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); > rcu_read_unlock(); > } Yes, this is what I meant. But I still think do_notify_parent_predump() should notify ->real_parent, not ->parent. Oleg.
Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification
Hi, On 10/26, Enke Chen wrote: > > This is really a good idea given that "parent" is declared as RCU-protected. > Just a bit odd, though, that the "parent" has not been accessed this way in > the code base. It is acccessed when possible, > So just to confirm: the revised code would look like the following: > > static void do_notify_parent_predump(void) > { > struct task_struct *parent; > int sig; > > rcu_read_lock(); > parent = rcu_dereference(current->parent); > sig = parent->signal->predump_signal; > if (sig != 0) > do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); > rcu_read_unlock(); > } Yes, this is what I meant. But I still think do_notify_parent_predump() should notify ->real_parent, not ->parent. Oleg.
Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification
Hi, Olge: This is really a good idea given that "parent" is declared as RCU-protected. Just a bit odd, though, that the "parent" has not been accessed this way in the code base. So just to confirm: the revised code would look like the following: static void do_notify_parent_predump(void) { struct task_struct *parent; int sig; rcu_read_lock(); parent = rcu_dereference(current->parent); sig = parent->signal->predump_signal; if (sig != 0) do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); rcu_read_unlock(); } Thank you so much for your help during this review. I would like to ack your contribution in the "Reviewed-by:" field. -- Enke On 10/26/18 1:28 AM, Oleg Nesterov wrote: > On 10/25, Enke Chen wrote: >> >> +static void do_notify_parent_predump(void) >> +{ >> +struct task_struct *parent; >> +int sig; >> + >> +read_lock(_lock); >> +parent = current->parent; >> +sig = parent->signal->predump_signal; >> +if (sig != 0) >> +do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); >> +read_unlock(_lock); > > Ah. It is strange I didn't think about this before... Please, do not take > tasklist_lock, use rcu_read_lock() instead. do_send_sig_info() uses the > rcu-friendly lock_task_sighand(), so rcu_dereference(parent) should work > fine. > > Oleg. >
Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification
Hi, Olge: This is really a good idea given that "parent" is declared as RCU-protected. Just a bit odd, though, that the "parent" has not been accessed this way in the code base. So just to confirm: the revised code would look like the following: static void do_notify_parent_predump(void) { struct task_struct *parent; int sig; rcu_read_lock(); parent = rcu_dereference(current->parent); sig = parent->signal->predump_signal; if (sig != 0) do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); rcu_read_unlock(); } Thank you so much for your help during this review. I would like to ack your contribution in the "Reviewed-by:" field. -- Enke On 10/26/18 1:28 AM, Oleg Nesterov wrote: > On 10/25, Enke Chen wrote: >> >> +static void do_notify_parent_predump(void) >> +{ >> +struct task_struct *parent; >> +int sig; >> + >> +read_lock(_lock); >> +parent = current->parent; >> +sig = parent->signal->predump_signal; >> +if (sig != 0) >> +do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); >> +read_unlock(_lock); > > Ah. It is strange I didn't think about this before... Please, do not take > tasklist_lock, use rcu_read_lock() instead. do_send_sig_info() uses the > rcu-friendly lock_task_sighand(), so rcu_dereference(parent) should work > fine. > > Oleg. >
Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification
On 10/25, Enke Chen wrote: > > +static void do_notify_parent_predump(void) > +{ > + struct task_struct *parent; > + int sig; > + > + read_lock(_lock); > + parent = current->parent; > + sig = parent->signal->predump_signal; > + if (sig != 0) > + do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); > + read_unlock(_lock); Ah. It is strange I didn't think about this before... Please, do not take tasklist_lock, use rcu_read_lock() instead. do_send_sig_info() uses the rcu-friendly lock_task_sighand(), so rcu_dereference(parent) should work fine. Oleg.
Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification
On 10/25, Enke Chen wrote: > > +static void do_notify_parent_predump(void) > +{ > + struct task_struct *parent; > + int sig; > + > + read_lock(_lock); > + parent = current->parent; > + sig = parent->signal->predump_signal; > + if (sig != 0) > + do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); > + read_unlock(_lock); Ah. It is strange I didn't think about this before... Please, do not take tasklist_lock, use rcu_read_lock() instead. do_send_sig_info() uses the rcu-friendly lock_task_sighand(), so rcu_dereference(parent) should work fine. Oleg.
[PATCH v4] kernel/signal: Signal-based pre-coredump notification
For simplicity and consistency, this patch provides an implementation for signal-based fault notification prior to the coredump of a child process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can be used by an application to express its interest and to specify the signal for such a notification. Changes to prctl(2): PR_SET_PREDUMP_SIG (since Linux 4.20.x) Set the child pre-coredump signal of the calling process to arg2 (either a signal value in the range 1..maxsig, or 0 to clear). This is the signal that the calling process will get prior to the coredump of a child process. This value is cleared across execve(2), or for the child of a fork(2). PR_GET_PREDUMP_SIG (since Linux 4.20.x) Return the current value of the child pre-coredump signal, in the location pointed to by (int *) arg2. Background: As the coredump of a process may take time, in certain time-sensitive applications it is necessary for a parent process (e.g., a process manager) to be notified of a child's imminent death before the coredump so that the parent process can act sooner, such as re-spawning an application process, or initiating a control-plane fail-over. One application is BFD. The early fault notification is a critical component for maintaining BFD sessions (with a timeout value of 50 msec or 100 msec) across a control-plane failure. Currently there are two ways for a parent process to be notified of a child process's state change. One is to use the POSIX signal, and another is to use the kernel connector module. The specific events and actions are summarized as follows: Process EventPOSIX SignalConnector-based -- ptrace_attach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_STOPPED ptrace_detach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_CONTINUED pre_coredump/N/A proc_coredump_connector() get_signal() post_coredump/ do_notify_parent() proc_exit_connector() do_exit()SIGCHLD / exit_signal -- As shown in the table, the signal-based pre-coredump notification is not currently available. In some cases using a connector-based notification can be quite complicated (e.g., when a process manager is written in shell scripts and thus is subject to certain inherent limitations), and a signal-based notification would be simpler and better suited. Signed-off-by: Enke Chen --- v3 -> v4: Addressed review comments from Oleg Nesterov, and Eric W. Biederman, including: o remove the definition CLD_PREDUMP. o code simplification. o split out the selftest code. fs/coredump.c| 23 +++ fs/exec.c| 3 +++ include/linux/sched/signal.h | 3 +++ include/uapi/linux/prctl.h | 4 kernel/sys.c | 13 + 5 files changed, 46 insertions(+) diff --git a/fs/coredump.c b/fs/coredump.c index e42e17e..22c40dc 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -536,6 +536,24 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) return err; } +/* + * While do_notify_parent() notifies the parent of a child's death post + * its coredump, this function lets the parent (if so desired) know about + * the imminent death of a child just prior to its coredump. + */ +static void do_notify_parent_predump(void) +{ + struct task_struct *parent; + int sig; + + read_lock(_lock); + parent = current->parent; + sig = parent->signal->predump_signal; + if (sig != 0) + do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); + read_unlock(_lock); +} + void do_coredump(const kernel_siginfo_t *siginfo) { struct core_state core_state; @@ -590,6 +608,11 @@ void do_coredump(const kernel_siginfo_t *siginfo) if (retval < 0) goto fail_creds; + /* +* Send the pre-coredump signal to the parent if requested. +*/ + do_notify_parent_predump(); + old_cred = override_creds(cred); ispipe = format_corename(, ); diff --git a/fs/exec.c b/fs/exec.c index fc281b7..7714da7 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk) /* we have changed execution domain */ tsk->exit_signal = SIGCHLD; + /* Clear the pre-coredump signal before loading a new binary */ + sig->predump_signal = 0; + #ifdef CONFIG_POSIX_TIMERS exit_itimers(sig); flush_itimer_signals(); diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 13789d1..728ef68 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -112,6 +112,9 @@ struct signal_struct {
[PATCH v4] kernel/signal: Signal-based pre-coredump notification
For simplicity and consistency, this patch provides an implementation for signal-based fault notification prior to the coredump of a child process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can be used by an application to express its interest and to specify the signal for such a notification. Changes to prctl(2): PR_SET_PREDUMP_SIG (since Linux 4.20.x) Set the child pre-coredump signal of the calling process to arg2 (either a signal value in the range 1..maxsig, or 0 to clear). This is the signal that the calling process will get prior to the coredump of a child process. This value is cleared across execve(2), or for the child of a fork(2). PR_GET_PREDUMP_SIG (since Linux 4.20.x) Return the current value of the child pre-coredump signal, in the location pointed to by (int *) arg2. Background: As the coredump of a process may take time, in certain time-sensitive applications it is necessary for a parent process (e.g., a process manager) to be notified of a child's imminent death before the coredump so that the parent process can act sooner, such as re-spawning an application process, or initiating a control-plane fail-over. One application is BFD. The early fault notification is a critical component for maintaining BFD sessions (with a timeout value of 50 msec or 100 msec) across a control-plane failure. Currently there are two ways for a parent process to be notified of a child process's state change. One is to use the POSIX signal, and another is to use the kernel connector module. The specific events and actions are summarized as follows: Process EventPOSIX SignalConnector-based -- ptrace_attach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_STOPPED ptrace_detach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_CONTINUED pre_coredump/N/A proc_coredump_connector() get_signal() post_coredump/ do_notify_parent() proc_exit_connector() do_exit()SIGCHLD / exit_signal -- As shown in the table, the signal-based pre-coredump notification is not currently available. In some cases using a connector-based notification can be quite complicated (e.g., when a process manager is written in shell scripts and thus is subject to certain inherent limitations), and a signal-based notification would be simpler and better suited. Signed-off-by: Enke Chen --- v3 -> v4: Addressed review comments from Oleg Nesterov, and Eric W. Biederman, including: o remove the definition CLD_PREDUMP. o code simplification. o split out the selftest code. fs/coredump.c| 23 +++ fs/exec.c| 3 +++ include/linux/sched/signal.h | 3 +++ include/uapi/linux/prctl.h | 4 kernel/sys.c | 13 + 5 files changed, 46 insertions(+) diff --git a/fs/coredump.c b/fs/coredump.c index e42e17e..22c40dc 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -536,6 +536,24 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) return err; } +/* + * While do_notify_parent() notifies the parent of a child's death post + * its coredump, this function lets the parent (if so desired) know about + * the imminent death of a child just prior to its coredump. + */ +static void do_notify_parent_predump(void) +{ + struct task_struct *parent; + int sig; + + read_lock(_lock); + parent = current->parent; + sig = parent->signal->predump_signal; + if (sig != 0) + do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); + read_unlock(_lock); +} + void do_coredump(const kernel_siginfo_t *siginfo) { struct core_state core_state; @@ -590,6 +608,11 @@ void do_coredump(const kernel_siginfo_t *siginfo) if (retval < 0) goto fail_creds; + /* +* Send the pre-coredump signal to the parent if requested. +*/ + do_notify_parent_predump(); + old_cred = override_creds(cred); ispipe = format_corename(, ); diff --git a/fs/exec.c b/fs/exec.c index fc281b7..7714da7 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk) /* we have changed execution domain */ tsk->exit_signal = SIGCHLD; + /* Clear the pre-coredump signal before loading a new binary */ + sig->predump_signal = 0; + #ifdef CONFIG_POSIX_TIMERS exit_itimers(sig); flush_itimer_signals(); diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 13789d1..728ef68 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -112,6 +112,9 @@ struct signal_struct {