Re: [PATCH v3 1/2] pid: add pidfd_open()

2019-06-27 Thread Konstantin Khlebnikov

On 20.05.2019 18:56, Christian Brauner wrote:

This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
process that is created via traditional fork()/clone() calls that is only
referenced by a PID:

int pidfd = pidfd_open(1234, 0);
ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);

With the introduction of pidfds through CLONE_PIDFD it is possible to
created pidfds at process creation time.
However, a lot of processes get created with traditional PID-based calls
such as fork() or clone() (without CLONE_PIDFD). For these processes a
caller can currently not create a pollable pidfd. This is a problem for
Android's low memory killer (LMK) and service managers such as systemd.
Both are examples of tools that want to make use of pidfds to get reliable
notification of process exit for non-parents (pidfd polling) and race-free
signal sending (pidfd_send_signal()). They intend to switch to this API for
process supervision/management as soon as possible. Having no way to get
pollable pidfds from PID-only processes is one of the biggest blockers for
them in adopting this api. With pidfd_open() making it possible to retrieve
pidfds for PID-based processes we enable them to adopt this api.

In line with Arnd's recent changes to consolidate syscall numbers across
architectures, I have added the pidfd_open() syscall to all architectures
at the same time.


As I see pidfd_open() works only within current pid-namespace.

Have you considered separate argument for pidns-fd or flag for opening pid in
pid-ns referred by nsproxy->pid_ns_for_children set by setns.

This could be used for use cases I've tried to cover by syscall "translate_pid"
https://lkml.org/lkml/2018/6/1/788



Signed-off-by: Christian Brauner 
Reviewed-by: Oleg Nesterov 
Acked-by: Arnd Bergmann 
Cc: "Eric W. Biederman" 
Cc: Kees Cook 
Cc: Joel Fernandes (Google) 
Cc: Thomas Gleixner 
Cc: Jann Horn 
Cc: David Howells 
Cc: Andy Lutomirsky 
Cc: Andrew Morton 
Cc: Aleksa Sarai 
Cc: Linus Torvalds 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
---
v1:
- kbuild test robot :
   - add missing entry for pidfd_open to arch/arm/tools/syscall.tbl
- Oleg Nesterov :
   - use simpler thread-group leader check
v2:
- Oleg Nesterov :
   - avoid using additional variable
   - remove unneeded comment
- Arnd Bergmann :
   - switch from 428 to 434 since the new mount api has taken it
   - bump syscall numbers in arch/arm64/include/asm/unistd.h
- Joel Fernandes (Google) :
   - switch from ESRCH to EINVAL when the passed-in pid does not refer to a
 thread-group leader
- Christian Brauner :
   - rebase on v5.2-rc1
   - adapt syscall number to account for new mount api syscalls
v3:
- Arnd Bergmann :
   - add missing syscall entries for mips-o32 and mips-n64
---
  arch/alpha/kernel/syscalls/syscall.tbl  |  1 +
  arch/arm/tools/syscall.tbl  |  1 +
  arch/arm64/include/asm/unistd.h |  2 +-
  arch/arm64/include/asm/unistd32.h   |  2 +
  arch/ia64/kernel/syscalls/syscall.tbl   |  1 +
  arch/m68k/kernel/syscalls/syscall.tbl   |  1 +
  arch/microblaze/kernel/syscalls/syscall.tbl |  1 +
  arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +
  arch/mips/kernel/syscalls/syscall_n64.tbl   |  1 +
  arch/mips/kernel/syscalls/syscall_o32.tbl   |  1 +
  arch/parisc/kernel/syscalls/syscall.tbl |  1 +
  arch/powerpc/kernel/syscalls/syscall.tbl|  1 +
  arch/s390/kernel/syscalls/syscall.tbl   |  1 +
  arch/sh/kernel/syscalls/syscall.tbl |  1 +
  arch/sparc/kernel/syscalls/syscall.tbl  |  1 +
  arch/x86/entry/syscalls/syscall_32.tbl  |  1 +
  arch/x86/entry/syscalls/syscall_64.tbl  |  1 +
  arch/xtensa/kernel/syscalls/syscall.tbl |  1 +
  include/linux/pid.h |  1 +
  include/linux/syscalls.h|  1 +
  include/uapi/asm-generic/unistd.h   |  4 +-
  kernel/fork.c   |  2 +-
  kernel/pid.c| 43 +
  23 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl 
b/arch/alpha/kernel/syscalls/syscall.tbl
index 9e7704e44f6d..1db9bbcfb84e 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -473,3 +473,4 @@
  541   common  fsconfigsys_fsconfig
  542   common  fsmount sys_fsmount
  543   common  fspick  sys_fspick
+544common  pidfd_open  sys_pidfd_open
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index aaf479a9e92d..81e6e1817c45 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -447,3 +447,4 @@
  431   common  fsconfigsys_fsconfig
  432   common  fsmount sys_fsmount
  433   common  fspick  sys_fspick
+434

Re: [PATCH v3 1/2] pid: add pidfd_open()

2019-05-24 Thread Christian Brauner
On Tue, May 21, 2019 at 04:32:20PM +0200, Christian Brauner wrote:
> On Mon, May 20, 2019 at 05:56:29PM +0200, Christian Brauner wrote:
> > This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
> > pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
> > process that is created via traditional fork()/clone() calls that is only
> > referenced by a PID:
> > 
> > int pidfd = pidfd_open(1234, 0);
> > ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);
> > 
> > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > created pidfds at process creation time.
> > However, a lot of processes get created with traditional PID-based calls
> > such as fork() or clone() (without CLONE_PIDFD). For these processes a
> > caller can currently not create a pollable pidfd. This is a problem for
> > Android's low memory killer (LMK) and service managers such as systemd.
> > Both are examples of tools that want to make use of pidfds to get reliable
> > notification of process exit for non-parents (pidfd polling) and race-free
> > signal sending (pidfd_send_signal()). They intend to switch to this API for
> > process supervision/management as soon as possible. Having no way to get
> > pollable pidfds from PID-only processes is one of the biggest blockers for
> > them in adopting this api. With pidfd_open() making it possible to retrieve
> > pidfds for PID-based processes we enable them to adopt this api.
> > 
> > In line with Arnd's recent changes to consolidate syscall numbers across
> > architectures, I have added the pidfd_open() syscall to all architectures
> > at the same time.
> > 
> > Signed-off-by: Christian Brauner 
> > Reviewed-by: Oleg Nesterov 
> 
> This now also carries a Reviewed-by from David.
> 
> > Acked-by: Arnd Bergmann 
> > Cc: "Eric W. Biederman" 
> > Cc: Kees Cook 
> > Cc: Joel Fernandes (Google) 
> > Cc: Thomas Gleixner 
> > Cc: Jann Horn 
> > Cc: David Howells 
> > Cc: Andy Lutomirsky 
> > Cc: Andrew Morton 
> > Cc: Aleksa Sarai 
> > Cc: Linus Torvalds 
> > Cc: Al Viro 
> > Cc: linux-...@vger.kernel.org
> 
> I've moved pidfd_open() into my for-next branch together with Joel's
> pidfd polling changes. Everything is based on v5.2-rc1.
> 
> The chosen syscall number for now is 434. David is going to send out
> another pile of mount api related syscalls. I'll coordinate with him
> accordingly prior to the 5.3 merge window.

After talking to Arnd, I split the syscall addition and the per-arch
wiring-up of pidfd_open() into two patches. There are no functional
changes and everything is still sitting in for-next.

Thanks!
Christian


Re: [PATCH v3 1/2] pid: add pidfd_open()

2019-05-21 Thread Christian Brauner
On Mon, May 20, 2019 at 05:56:29PM +0200, Christian Brauner wrote:
> This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
> pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
> process that is created via traditional fork()/clone() calls that is only
> referenced by a PID:
> 
> int pidfd = pidfd_open(1234, 0);
> ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);
> 
> With the introduction of pidfds through CLONE_PIDFD it is possible to
> created pidfds at process creation time.
> However, a lot of processes get created with traditional PID-based calls
> such as fork() or clone() (without CLONE_PIDFD). For these processes a
> caller can currently not create a pollable pidfd. This is a problem for
> Android's low memory killer (LMK) and service managers such as systemd.
> Both are examples of tools that want to make use of pidfds to get reliable
> notification of process exit for non-parents (pidfd polling) and race-free
> signal sending (pidfd_send_signal()). They intend to switch to this API for
> process supervision/management as soon as possible. Having no way to get
> pollable pidfds from PID-only processes is one of the biggest blockers for
> them in adopting this api. With pidfd_open() making it possible to retrieve
> pidfds for PID-based processes we enable them to adopt this api.
> 
> In line with Arnd's recent changes to consolidate syscall numbers across
> architectures, I have added the pidfd_open() syscall to all architectures
> at the same time.
> 
> Signed-off-by: Christian Brauner 
> Reviewed-by: Oleg Nesterov 

This now also carries a Reviewed-by from David.

> Acked-by: Arnd Bergmann 
> Cc: "Eric W. Biederman" 
> Cc: Kees Cook 
> Cc: Joel Fernandes (Google) 
> Cc: Thomas Gleixner 
> Cc: Jann Horn 
> Cc: David Howells 
> Cc: Andy Lutomirsky 
> Cc: Andrew Morton 
> Cc: Aleksa Sarai 
> Cc: Linus Torvalds 
> Cc: Al Viro 
> Cc: linux-...@vger.kernel.org

I've moved pidfd_open() into my for-next branch together with Joel's
pidfd polling changes. Everything is based on v5.2-rc1.

The chosen syscall number for now is 434. David is going to send out
another pile of mount api related syscalls. I'll coordinate with him
accordingly prior to the 5.3 merge window.

Thanks!
Christian


[PATCH v3 1/2] pid: add pidfd_open()

2019-05-20 Thread Christian Brauner
This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
process that is created via traditional fork()/clone() calls that is only
referenced by a PID:

int pidfd = pidfd_open(1234, 0);
ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);

With the introduction of pidfds through CLONE_PIDFD it is possible to
created pidfds at process creation time.
However, a lot of processes get created with traditional PID-based calls
such as fork() or clone() (without CLONE_PIDFD). For these processes a
caller can currently not create a pollable pidfd. This is a problem for
Android's low memory killer (LMK) and service managers such as systemd.
Both are examples of tools that want to make use of pidfds to get reliable
notification of process exit for non-parents (pidfd polling) and race-free
signal sending (pidfd_send_signal()). They intend to switch to this API for
process supervision/management as soon as possible. Having no way to get
pollable pidfds from PID-only processes is one of the biggest blockers for
them in adopting this api. With pidfd_open() making it possible to retrieve
pidfds for PID-based processes we enable them to adopt this api.

In line with Arnd's recent changes to consolidate syscall numbers across
architectures, I have added the pidfd_open() syscall to all architectures
at the same time.

Signed-off-by: Christian Brauner 
Reviewed-by: Oleg Nesterov 
Acked-by: Arnd Bergmann 
Cc: "Eric W. Biederman" 
Cc: Kees Cook 
Cc: Joel Fernandes (Google) 
Cc: Thomas Gleixner 
Cc: Jann Horn 
Cc: David Howells 
Cc: Andy Lutomirsky 
Cc: Andrew Morton 
Cc: Aleksa Sarai 
Cc: Linus Torvalds 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
---
v1:
- kbuild test robot :
  - add missing entry for pidfd_open to arch/arm/tools/syscall.tbl
- Oleg Nesterov :
  - use simpler thread-group leader check
v2:
- Oleg Nesterov :
  - avoid using additional variable
  - remove unneeded comment
- Arnd Bergmann :
  - switch from 428 to 434 since the new mount api has taken it
  - bump syscall numbers in arch/arm64/include/asm/unistd.h
- Joel Fernandes (Google) :
  - switch from ESRCH to EINVAL when the passed-in pid does not refer to a
thread-group leader
- Christian Brauner :
  - rebase on v5.2-rc1
  - adapt syscall number to account for new mount api syscalls
v3:
- Arnd Bergmann :
  - add missing syscall entries for mips-o32 and mips-n64
---
 arch/alpha/kernel/syscalls/syscall.tbl  |  1 +
 arch/arm/tools/syscall.tbl  |  1 +
 arch/arm64/include/asm/unistd.h |  2 +-
 arch/arm64/include/asm/unistd32.h   |  2 +
 arch/ia64/kernel/syscalls/syscall.tbl   |  1 +
 arch/m68k/kernel/syscalls/syscall.tbl   |  1 +
 arch/microblaze/kernel/syscalls/syscall.tbl |  1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |  1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   |  1 +
 arch/parisc/kernel/syscalls/syscall.tbl |  1 +
 arch/powerpc/kernel/syscalls/syscall.tbl|  1 +
 arch/s390/kernel/syscalls/syscall.tbl   |  1 +
 arch/sh/kernel/syscalls/syscall.tbl |  1 +
 arch/sparc/kernel/syscalls/syscall.tbl  |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl  |  1 +
 arch/x86/entry/syscalls/syscall_64.tbl  |  1 +
 arch/xtensa/kernel/syscalls/syscall.tbl |  1 +
 include/linux/pid.h |  1 +
 include/linux/syscalls.h|  1 +
 include/uapi/asm-generic/unistd.h   |  4 +-
 kernel/fork.c   |  2 +-
 kernel/pid.c| 43 +
 23 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl 
b/arch/alpha/kernel/syscalls/syscall.tbl
index 9e7704e44f6d..1db9bbcfb84e 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -473,3 +473,4 @@
 541common  fsconfigsys_fsconfig
 542common  fsmount sys_fsmount
 543common  fspick  sys_fspick
+544common  pidfd_open  sys_pidfd_open
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index aaf479a9e92d..81e6e1817c45 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -447,3 +447,4 @@
 431common  fsconfigsys_fsconfig
 432common  fsmount sys_fsmount
 433common  fspick  sys_fspick
+434common  pidfd_open  sys_pidfd_open
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 70e6882853c0..e8f7d95a1481 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -44,7 +44,7 @@
 #define __ARM_NR_compat_set_tls(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END(__ARM_NR_COMPAT_BASE +