Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification

2018-10-29 Thread Enke Chen
Hi, Oleg:

Yes, it should be the "real_parent" that is more interested in the notification.
Will revert back.

+static void do_notify_parent_predump(void)
+{
+   struct task_struct *parent;
+   int sig;
+
+   rcu_read_lock();
+   parent = rcu_dereference(current->real_parent);
+   sig = parent->signal->predump_signal;
+   if (sig != 0)
+   do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
+   rcu_read_unlock();
+}

Thanks.  -- Enke

On 10/29/18 4:18 AM, Oleg Nesterov wrote:
> Hi,
> 
> On 10/26, Enke Chen wrote:
>>
>> This is really a good idea given that "parent" is declared as RCU-protected.
>> Just a bit odd, though, that the "parent" has not been accessed this way in
>> the code base.
> 
> It is acccessed when possible,
> 
>> So just to confirm: the revised code would look like the following:
>>
>> static void do_notify_parent_predump(void)
>> {
>> struct task_struct *parent;
>> int sig;
>>
>> rcu_read_lock();
>> parent = rcu_dereference(current->parent);
>> sig = parent->signal->predump_signal;
>> if (sig != 0)
>> do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
>> rcu_read_unlock();
>> }
> 
> Yes, this is what I meant.
> 
> But I still think do_notify_parent_predump() should notify ->real_parent,
> not ->parent.
> 
> Oleg.
> 


Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification

2018-10-29 Thread Enke Chen
Hi, Oleg:

Yes, it should be the "real_parent" that is more interested in the notification.
Will revert back.

+static void do_notify_parent_predump(void)
+{
+   struct task_struct *parent;
+   int sig;
+
+   rcu_read_lock();
+   parent = rcu_dereference(current->real_parent);
+   sig = parent->signal->predump_signal;
+   if (sig != 0)
+   do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
+   rcu_read_unlock();
+}

Thanks.  -- Enke

On 10/29/18 4:18 AM, Oleg Nesterov wrote:
> Hi,
> 
> On 10/26, Enke Chen wrote:
>>
>> This is really a good idea given that "parent" is declared as RCU-protected.
>> Just a bit odd, though, that the "parent" has not been accessed this way in
>> the code base.
> 
> It is acccessed when possible,
> 
>> So just to confirm: the revised code would look like the following:
>>
>> static void do_notify_parent_predump(void)
>> {
>> struct task_struct *parent;
>> int sig;
>>
>> rcu_read_lock();
>> parent = rcu_dereference(current->parent);
>> sig = parent->signal->predump_signal;
>> if (sig != 0)
>> do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
>> rcu_read_unlock();
>> }
> 
> Yes, this is what I meant.
> 
> But I still think do_notify_parent_predump() should notify ->real_parent,
> not ->parent.
> 
> Oleg.
> 


Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification

2018-10-29 Thread Oleg Nesterov
Hi,

On 10/26, Enke Chen wrote:
>
> This is really a good idea given that "parent" is declared as RCU-protected.
> Just a bit odd, though, that the "parent" has not been accessed this way in
> the code base.

It is acccessed when possible,

> So just to confirm: the revised code would look like the following:
>
> static void do_notify_parent_predump(void)
> {
> struct task_struct *parent;
> int sig;
>
> rcu_read_lock();
> parent = rcu_dereference(current->parent);
> sig = parent->signal->predump_signal;
> if (sig != 0)
> do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
> rcu_read_unlock();
> }

Yes, this is what I meant.

But I still think do_notify_parent_predump() should notify ->real_parent,
not ->parent.

Oleg.



Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification

2018-10-29 Thread Oleg Nesterov
Hi,

On 10/26, Enke Chen wrote:
>
> This is really a good idea given that "parent" is declared as RCU-protected.
> Just a bit odd, though, that the "parent" has not been accessed this way in
> the code base.

It is acccessed when possible,

> So just to confirm: the revised code would look like the following:
>
> static void do_notify_parent_predump(void)
> {
> struct task_struct *parent;
> int sig;
>
> rcu_read_lock();
> parent = rcu_dereference(current->parent);
> sig = parent->signal->predump_signal;
> if (sig != 0)
> do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
> rcu_read_unlock();
> }

Yes, this is what I meant.

But I still think do_notify_parent_predump() should notify ->real_parent,
not ->parent.

Oleg.



Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification

2018-10-26 Thread Enke Chen
Hi, Olge:

This is really a good idea given that "parent" is declared as RCU-protected.
Just a bit odd, though, that the "parent" has not been accessed this way in
the code base.

So just to confirm: the revised code would look like the following:

static void do_notify_parent_predump(void)
{
struct task_struct *parent;
int sig;

rcu_read_lock();
parent = rcu_dereference(current->parent);
sig = parent->signal->predump_signal;
if (sig != 0)
do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
rcu_read_unlock();
}

Thank you so much for your help during this review. I would like to ack your
contribution in the "Reviewed-by:" field.

-- Enke

On 10/26/18 1:28 AM, Oleg Nesterov wrote:
> On 10/25, Enke Chen wrote:
>>
>> +static void do_notify_parent_predump(void)
>> +{
>> +struct task_struct *parent;
>> +int sig;
>> +
>> +read_lock(_lock);
>> +parent = current->parent;
>> +sig = parent->signal->predump_signal;
>> +if (sig != 0)
>> +do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
>> +read_unlock(_lock);
> 
> Ah. It is strange I didn't think about this before... Please, do not take
> tasklist_lock, use rcu_read_lock() instead. do_send_sig_info() uses the
> rcu-friendly lock_task_sighand(), so rcu_dereference(parent) should work
> fine.
> 
> Oleg.
> 


Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification

2018-10-26 Thread Enke Chen
Hi, Olge:

This is really a good idea given that "parent" is declared as RCU-protected.
Just a bit odd, though, that the "parent" has not been accessed this way in
the code base.

So just to confirm: the revised code would look like the following:

static void do_notify_parent_predump(void)
{
struct task_struct *parent;
int sig;

rcu_read_lock();
parent = rcu_dereference(current->parent);
sig = parent->signal->predump_signal;
if (sig != 0)
do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
rcu_read_unlock();
}

Thank you so much for your help during this review. I would like to ack your
contribution in the "Reviewed-by:" field.

-- Enke

On 10/26/18 1:28 AM, Oleg Nesterov wrote:
> On 10/25, Enke Chen wrote:
>>
>> +static void do_notify_parent_predump(void)
>> +{
>> +struct task_struct *parent;
>> +int sig;
>> +
>> +read_lock(_lock);
>> +parent = current->parent;
>> +sig = parent->signal->predump_signal;
>> +if (sig != 0)
>> +do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
>> +read_unlock(_lock);
> 
> Ah. It is strange I didn't think about this before... Please, do not take
> tasklist_lock, use rcu_read_lock() instead. do_send_sig_info() uses the
> rcu-friendly lock_task_sighand(), so rcu_dereference(parent) should work
> fine.
> 
> Oleg.
> 


Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification

2018-10-26 Thread Oleg Nesterov
On 10/25, Enke Chen wrote:
>
> +static void do_notify_parent_predump(void)
> +{
> + struct task_struct *parent;
> + int sig;
> +
> + read_lock(_lock);
> + parent = current->parent;
> + sig = parent->signal->predump_signal;
> + if (sig != 0)
> + do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
> + read_unlock(_lock);

Ah. It is strange I didn't think about this before... Please, do not take
tasklist_lock, use rcu_read_lock() instead. do_send_sig_info() uses the
rcu-friendly lock_task_sighand(), so rcu_dereference(parent) should work
fine.

Oleg.



Re: [PATCH v4] kernel/signal: Signal-based pre-coredump notification

2018-10-26 Thread Oleg Nesterov
On 10/25, Enke Chen wrote:
>
> +static void do_notify_parent_predump(void)
> +{
> + struct task_struct *parent;
> + int sig;
> +
> + read_lock(_lock);
> + parent = current->parent;
> + sig = parent->signal->predump_signal;
> + if (sig != 0)
> + do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
> + read_unlock(_lock);

Ah. It is strange I didn't think about this before... Please, do not take
tasklist_lock, use rcu_read_lock() instead. do_send_sig_info() uses the
rcu-friendly lock_task_sighand(), so rcu_dereference(parent) should work
fine.

Oleg.



[PATCH v4] kernel/signal: Signal-based pre-coredump notification

2018-10-25 Thread Enke Chen
For simplicity and consistency, this patch provides an implementation
for signal-based fault notification prior to the coredump of a child
process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can
be used by an application to express its interest and to specify the
signal for such a notification.

Changes to prctl(2):

   PR_SET_PREDUMP_SIG (since Linux 4.20.x)
  Set the child pre-coredump signal of the calling process to
  arg2 (either a signal value in the range 1..maxsig, or 0 to
  clear). This is the signal that the calling process will get
  prior to the coredump of a child process. This value is
  cleared across execve(2), or for the child of a fork(2).

   PR_GET_PREDUMP_SIG (since Linux 4.20.x)
  Return the current value of the child pre-coredump signal,
  in the location pointed to by (int *) arg2.

Background:

As the coredump of a process may take time, in certain time-sensitive
applications it is necessary for a parent process (e.g., a process
manager) to be notified of a child's imminent death before the coredump
so that the parent process can act sooner, such as re-spawning an
application process, or initiating a control-plane fail-over.

One application is BFD. The early fault notification is a critical
component for maintaining BFD sessions (with a timeout value of
50 msec or 100 msec) across a control-plane failure.

Currently there are two ways for a parent process to be notified of a
child process's state change. One is to use the POSIX signal, and
another is to use the kernel connector module. The specific events and
actions are summarized as follows:

Process EventPOSIX SignalConnector-based
--
ptrace_attach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_STOPPED

ptrace_detach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_CONTINUED

pre_coredump/N/A proc_coredump_connector()
get_signal()

post_coredump/   do_notify_parent()  proc_exit_connector()
do_exit()SIGCHLD / exit_signal
--

As shown in the table, the signal-based pre-coredump notification is not
currently available. In some cases using a connector-based notification
can be quite complicated (e.g., when a process manager is written in shell
scripts and thus is subject to certain inherent limitations), and a
signal-based notification would be simpler and better suited.

Signed-off-by: Enke Chen 
---
v3 -> v4:

Addressed review comments from Oleg Nesterov, and Eric W. Biederman,
including:
o remove the definition CLD_PREDUMP.
o code simplification.
o split out the selftest code.

 fs/coredump.c| 23 +++
 fs/exec.c|  3 +++
 include/linux/sched/signal.h |  3 +++
 include/uapi/linux/prctl.h   |  4 
 kernel/sys.c | 13 +
 5 files changed, 46 insertions(+)

diff --git a/fs/coredump.c b/fs/coredump.c
index e42e17e..22c40dc 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -536,6 +536,24 @@ static int umh_pipe_setup(struct subprocess_info *info, 
struct cred *new)
return err;
 }
 
+/*
+ * While do_notify_parent() notifies the parent of a child's death post
+ * its coredump, this function lets the parent (if so desired) know about
+ * the imminent death of a child just prior to its coredump.
+ */
+static void do_notify_parent_predump(void)
+{
+   struct task_struct *parent;
+   int sig;
+
+   read_lock(_lock);
+   parent = current->parent;
+   sig = parent->signal->predump_signal;
+   if (sig != 0)
+   do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
+   read_unlock(_lock);
+}
+
 void do_coredump(const kernel_siginfo_t *siginfo)
 {
struct core_state core_state;
@@ -590,6 +608,11 @@ void do_coredump(const kernel_siginfo_t *siginfo)
if (retval < 0)
goto fail_creds;
 
+   /*
+* Send the pre-coredump signal to the parent if requested.
+*/
+   do_notify_parent_predump();
+
old_cred = override_creds(cred);
 
ispipe = format_corename(, );
diff --git a/fs/exec.c b/fs/exec.c
index fc281b7..7714da7 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk)
/* we have changed execution domain */
tsk->exit_signal = SIGCHLD;
 
+   /* Clear the pre-coredump signal before loading a new binary */
+   sig->predump_signal = 0;
+
 #ifdef CONFIG_POSIX_TIMERS
exit_itimers(sig);
flush_itimer_signals();
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 13789d1..728ef68 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -112,6 +112,9 @@ struct signal_struct {
 

[PATCH v4] kernel/signal: Signal-based pre-coredump notification

2018-10-25 Thread Enke Chen
For simplicity and consistency, this patch provides an implementation
for signal-based fault notification prior to the coredump of a child
process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can
be used by an application to express its interest and to specify the
signal for such a notification.

Changes to prctl(2):

   PR_SET_PREDUMP_SIG (since Linux 4.20.x)
  Set the child pre-coredump signal of the calling process to
  arg2 (either a signal value in the range 1..maxsig, or 0 to
  clear). This is the signal that the calling process will get
  prior to the coredump of a child process. This value is
  cleared across execve(2), or for the child of a fork(2).

   PR_GET_PREDUMP_SIG (since Linux 4.20.x)
  Return the current value of the child pre-coredump signal,
  in the location pointed to by (int *) arg2.

Background:

As the coredump of a process may take time, in certain time-sensitive
applications it is necessary for a parent process (e.g., a process
manager) to be notified of a child's imminent death before the coredump
so that the parent process can act sooner, such as re-spawning an
application process, or initiating a control-plane fail-over.

One application is BFD. The early fault notification is a critical
component for maintaining BFD sessions (with a timeout value of
50 msec or 100 msec) across a control-plane failure.

Currently there are two ways for a parent process to be notified of a
child process's state change. One is to use the POSIX signal, and
another is to use the kernel connector module. The specific events and
actions are summarized as follows:

Process EventPOSIX SignalConnector-based
--
ptrace_attach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_STOPPED

ptrace_detach()  do_notify_parent_cldstop()  proc_ptrace_connector()
 SIGCHLD / CLD_CONTINUED

pre_coredump/N/A proc_coredump_connector()
get_signal()

post_coredump/   do_notify_parent()  proc_exit_connector()
do_exit()SIGCHLD / exit_signal
--

As shown in the table, the signal-based pre-coredump notification is not
currently available. In some cases using a connector-based notification
can be quite complicated (e.g., when a process manager is written in shell
scripts and thus is subject to certain inherent limitations), and a
signal-based notification would be simpler and better suited.

Signed-off-by: Enke Chen 
---
v3 -> v4:

Addressed review comments from Oleg Nesterov, and Eric W. Biederman,
including:
o remove the definition CLD_PREDUMP.
o code simplification.
o split out the selftest code.

 fs/coredump.c| 23 +++
 fs/exec.c|  3 +++
 include/linux/sched/signal.h |  3 +++
 include/uapi/linux/prctl.h   |  4 
 kernel/sys.c | 13 +
 5 files changed, 46 insertions(+)

diff --git a/fs/coredump.c b/fs/coredump.c
index e42e17e..22c40dc 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -536,6 +536,24 @@ static int umh_pipe_setup(struct subprocess_info *info, 
struct cred *new)
return err;
 }
 
+/*
+ * While do_notify_parent() notifies the parent of a child's death post
+ * its coredump, this function lets the parent (if so desired) know about
+ * the imminent death of a child just prior to its coredump.
+ */
+static void do_notify_parent_predump(void)
+{
+   struct task_struct *parent;
+   int sig;
+
+   read_lock(_lock);
+   parent = current->parent;
+   sig = parent->signal->predump_signal;
+   if (sig != 0)
+   do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID);
+   read_unlock(_lock);
+}
+
 void do_coredump(const kernel_siginfo_t *siginfo)
 {
struct core_state core_state;
@@ -590,6 +608,11 @@ void do_coredump(const kernel_siginfo_t *siginfo)
if (retval < 0)
goto fail_creds;
 
+   /*
+* Send the pre-coredump signal to the parent if requested.
+*/
+   do_notify_parent_predump();
+
old_cred = override_creds(cred);
 
ispipe = format_corename(, );
diff --git a/fs/exec.c b/fs/exec.c
index fc281b7..7714da7 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk)
/* we have changed execution domain */
tsk->exit_signal = SIGCHLD;
 
+   /* Clear the pre-coredump signal before loading a new binary */
+   sig->predump_signal = 0;
+
 #ifdef CONFIG_POSIX_TIMERS
exit_itimers(sig);
flush_itimer_signals();
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 13789d1..728ef68 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -112,6 +112,9 @@ struct signal_struct {