Re: [GIT PULL 3/4] ARM: SoC-related driver updates

2019-05-16 Thread Olof Johansson
On Thu, May 16, 2019 at 9:27 AM Linus Torvalds
 wrote:
>
> On Wed, May 15, 2019 at 11:43 PM Olof Johansson  wrote:
> >
> > Various driver updates for platforms and a couple of the small driver
> > subsystems we merge through our tree:
>
> Hmm. This moved the aspeed drivers from drivers/misc to
> drivers/soc/aspeed (in commit 524feb799408 "soc: add aspeed folder and
> misc drivers"), but in the meantime we also had a new aspeed soc
> driver added (in commit 01c60dcea9f7 "drivers/misc: Add Aspeed P2A
> control driver").
>
> I ended up resolving that "conflict" by moving the new aspeed P2A
> control driver to be with the other aspeed drivers too. That seemed to
> be the cleanest model.

Yeah, that's the approach we're heading towards with aspeed.

Part of the reason for why I wasn't 100% sure we wanted to move all
drivers over, is that most of drivers/soc has been for "soc glue
logic" code, not for the little SoC-specific drivers where we've
pushed hard to get out into their best-matching driver directories
instead.

Aspeed is an unusually "messy" SoC in that it has a handful of little
widgets used to communicate with the host (in its role as BMC), and
either we'd squint and put all of them in drivers/misc, or we could
pick them up in drivers/soc as we're now doing. Either way the code
will be in the kernel, and keeping it together might not be a bad
idea.

We might get more of a kitchen sink in drivers/soc over time with this
slight change in approach, but we've dealt with messes before and if
it happens, we'll clean it up when it gets too bad. Sometimes letting
it happen is the best way of seeing the bigger picture and not
over-engineer something upfront.

> I'm used to doing these kinds of fixups in a merge, but I have to
> admit that maybe I should have made it a separate commit, because now
> it's kind of non-obvious, and it's sometimes harder to see changes
> that are in a merge commit than in a separate commit.
>
> In particular, it looks like "git log --follow" is not smart enough to
> follow a rename through a merge. But I think that is a git problem,
> and not a very serious one at that ("git blame" has no such problem).
>
> And it means that now the merge has
>
>  drivers/{misc => soc/aspeed}/aspeed-lpc-ctrl.c   |   0
>  drivers/{misc => soc/aspeed}/aspeed-lpc-snoop.c  |   0
>  drivers/{misc => soc/aspeed}/aspeed-p2a-ctrl.c   |   0
>
> when you do "git show --stat" on it, which looks correct, and it feels
> like conceptually the right merge resolution to me.
>
> Sending out this explanatory email to everybody involved, just so that
> this doesn't take you by surprise. But it looks like Patrick Venture
> is not just the author of that moved driver, he was also involved in
> the move of the two other drivers, so I'm guessing there's not going
> to be a lot of confusion here.

Yeah. I think that's fine in this case.

I've got some horror stories from botched rebases where merges ended
up containing actual code changes and that caused immense confusion,
but that's not the case here.

> HOWEVER. More subtly, as part of my *testing* for this, I also
> realized that commit 524feb799408 is buggy. In my tests, the config
> worked fine, but the aspeed drivers were never actually *built*. The
> reason is that commit 524feb799408 ends up doing
>
>obj-$(CONFIG_ARCH_ASPEED)  += aspeed/
>
> which is completely wrong, because the Kconfig fules are
>
> depends on (ARCH_ASPEED || COMPILE_TEST) && REGMAP && MFD_SYSCON
>
> so those drivers can be configured even if ARCH_ASPEED *isn't* set.
> The Kconfig part works fine, because the soc/aspeed/Kconfig file is
> included unconditionally, but the actual build process then never
> builds anything in the drivers/soc/aspeed/ subdirectory.
>
> I solved _that_ problem by adding a new config option:
>
>   config SOC_ASPEED
>   def_bool y
>   depends on ARCH_ASPEED || COMPILE_TEST
>
> and using that instead of ARCH_ASPEED.

Yep, looks good -- thanks!

> End result: this was a somewhat messy merge, and the most subtle mess
> was because of that buggy 524feb799408 "soc: add aspeed folder and
> misc drivers").
>
> I *think* I sorted it all out correctly, and now I see the aspeed
> drivers being built (and cleanly at that) but I really *really* want
> people to double-check this all.
>
> Also, I think that the same "we don't actually build-test the end
> result" problem exists else-where for the same reasons.
>
> At the very least, drivers/soc/{atmel,rockchip,zte} seem to have the
> exact same pattern: the Kconfig files enable the drivers, but the
> Makefile in drivers/soc doesn't actually traverse into the
> subdirectories.
>
> End result: CONFIG_COMPILE_TEST doesn't actually do any compile
> testing for those drivers.
>
> I did not try to fix all of those things up, because I didn't do the
> driver movements there.

We'll follow up with patches for that, thanks for pointing it out.

I have to admit that most 

Re: [GIT PULL 3/4] ARM: SoC-related driver updates

2019-05-16 Thread Patrick Venture
From: Linus Torvalds 
Date: Thu, May 16, 2019 at 9:27 AM
To: Olof Johansson, Patrick Venture, Greg Kroah-Hartman
Cc: ARM SoC, Linux List Kernel Mailing, linux-alpha@vger.kernel.org

> On Wed, May 15, 2019 at 11:43 PM Olof Johansson  wrote:
> >
> > Various driver updates for platforms and a couple of the small driver
> > subsystems we merge through our tree:
>
> Hmm. This moved the aspeed drivers from drivers/misc to
> drivers/soc/aspeed (in commit 524feb799408 "soc: add aspeed folder and
> misc drivers"), but in the meantime we also had a new aspeed soc
> driver added (in commit 01c60dcea9f7 "drivers/misc: Add Aspeed P2A
> control driver").
>
> I ended up resolving that "conflict" by moving the new aspeed P2A
> control driver to be with the other aspeed drivers too. That seemed to
> be the cleanest model.

Thank you.  I agree.  There was some back-and-forth about the SoC move
w.r.t any new aspeed misc drivers. Whether moving them into SoC was a
good approach versus leaving the growing list in misc.  Another aspeed
driver, controlling UART was headed to misc and received push-back
that it was sufficiently specialized to go into SoC
(https://patchwork.ozlabs.org/patch/969238/).  This feedback triggered
this staging move.

I think storing the growing misc drivers for these SoCs (Aspeed,
Nuvoton) in a SoC folder is a reasonable grouping.

>
> I'm used to doing these kinds of fixups in a merge, but I have to
> admit that maybe I should have made it a separate commit, because now
> it's kind of non-obvious, and it's sometimes harder to see changes
> that are in a merge commit than in a separate commit.
>
> In particular, it looks like "git log --follow" is not smart enough to
> follow a rename through a merge. But I think that is a git problem,
> and not a very serious one at that ("git blame" has no such problem).
>
> And it means that now the merge has
>
>  drivers/{misc => soc/aspeed}/aspeed-lpc-ctrl.c   |   0
>  drivers/{misc => soc/aspeed}/aspeed-lpc-snoop.c  |   0
>  drivers/{misc => soc/aspeed}/aspeed-p2a-ctrl.c   |   0
>
> when you do "git show --stat" on it, which looks correct, and it feels
> like conceptually the right merge resolution to me.
>
> Sending out this explanatory email to everybody involved, just so that
> this doesn't take you by surprise. But it looks like Patrick Venture
> is not just the author of that moved driver, he was also involved in
> the move of the two other drivers, so I'm guessing there's not going
> to be a lot of confusion here.
>
> HOWEVER. More subtly, as part of my *testing* for this, I also
> realized that commit 524feb799408 is buggy. In my tests, the config
> worked fine, but the aspeed drivers were never actually *built*. The
> reason is that commit 524feb799408 ends up doing
>
>obj-$(CONFIG_ARCH_ASPEED)  += aspeed/
>
> which is completely wrong, because the Kconfig fules are
>
> depends on (ARCH_ASPEED || COMPILE_TEST) && REGMAP && MFD_SYSCON
>
> so those drivers can be configured even if ARCH_ASPEED *isn't* set.
> The Kconfig part works fine, because the soc/aspeed/Kconfig file is
> included unconditionally, but the actual build process then never
> builds anything in the drivers/soc/aspeed/ subdirectory.
>
> I solved _that_ problem by adding a new config option:
>
>   config SOC_ASPEED
>   def_bool y
>   depends on ARCH_ASPEED || COMPILE_TEST
>
> and using that instead of ARCH_ASPEED.

Thank you, that makes perfect sense.  When moving the drivers, I was
only considering the case where one is compiling them for use and
forgot to check for COMPILE_TEST.

>
> End result: this was a somewhat messy merge, and the most subtle mess
> was because of that buggy 524feb799408 "soc: add aspeed folder and
> misc drivers").
>
> I *think* I sorted it all out correctly, and now I see the aspeed
> drivers being built (and cleanly at that) but I really *really* want
> people to double-check this all.
>
> Also, I think that the same "we don't actually build-test the end
> result" problem exists else-where for the same reasons.
>
> At the very least, drivers/soc/{atmel,rockchip,zte} seem to have the
> exact same pattern: the Kconfig files enable the drivers, but the
> Makefile in drivers/soc doesn't actually traverse into the
> subdirectories.
>
> End result: CONFIG_COMPILE_TEST doesn't actually do any compile
> testing for those drivers.
>
> I did not try to fix all of those things up, because I didn't do the
> driver movements there.
>
>   Linus


Re: [GIT PULL 3/4] ARM: SoC-related driver updates

2019-05-16 Thread Linus Torvalds
On Wed, May 15, 2019 at 11:43 PM Olof Johansson  wrote:
>
> Various driver updates for platforms and a couple of the small driver
> subsystems we merge through our tree:

Hmm. This moved the aspeed drivers from drivers/misc to
drivers/soc/aspeed (in commit 524feb799408 "soc: add aspeed folder and
misc drivers"), but in the meantime we also had a new aspeed soc
driver added (in commit 01c60dcea9f7 "drivers/misc: Add Aspeed P2A
control driver").

I ended up resolving that "conflict" by moving the new aspeed P2A
control driver to be with the other aspeed drivers too. That seemed to
be the cleanest model.

I'm used to doing these kinds of fixups in a merge, but I have to
admit that maybe I should have made it a separate commit, because now
it's kind of non-obvious, and it's sometimes harder to see changes
that are in a merge commit than in a separate commit.

In particular, it looks like "git log --follow" is not smart enough to
follow a rename through a merge. But I think that is a git problem,
and not a very serious one at that ("git blame" has no such problem).

And it means that now the merge has

 drivers/{misc => soc/aspeed}/aspeed-lpc-ctrl.c   |   0
 drivers/{misc => soc/aspeed}/aspeed-lpc-snoop.c  |   0
 drivers/{misc => soc/aspeed}/aspeed-p2a-ctrl.c   |   0

when you do "git show --stat" on it, which looks correct, and it feels
like conceptually the right merge resolution to me.

Sending out this explanatory email to everybody involved, just so that
this doesn't take you by surprise. But it looks like Patrick Venture
is not just the author of that moved driver, he was also involved in
the move of the two other drivers, so I'm guessing there's not going
to be a lot of confusion here.

HOWEVER. More subtly, as part of my *testing* for this, I also
realized that commit 524feb799408 is buggy. In my tests, the config
worked fine, but the aspeed drivers were never actually *built*. The
reason is that commit 524feb799408 ends up doing

   obj-$(CONFIG_ARCH_ASPEED)  += aspeed/

which is completely wrong, because the Kconfig fules are

depends on (ARCH_ASPEED || COMPILE_TEST) && REGMAP && MFD_SYSCON

so those drivers can be configured even if ARCH_ASPEED *isn't* set.
The Kconfig part works fine, because the soc/aspeed/Kconfig file is
included unconditionally, but the actual build process then never
builds anything in the drivers/soc/aspeed/ subdirectory.

I solved _that_ problem by adding a new config option:

  config SOC_ASPEED
  def_bool y
  depends on ARCH_ASPEED || COMPILE_TEST

and using that instead of ARCH_ASPEED.

End result: this was a somewhat messy merge, and the most subtle mess
was because of that buggy 524feb799408 "soc: add aspeed folder and
misc drivers").

I *think* I sorted it all out correctly, and now I see the aspeed
drivers being built (and cleanly at that) but I really *really* want
people to double-check this all.

Also, I think that the same "we don't actually build-test the end
result" problem exists else-where for the same reasons.

At the very least, drivers/soc/{atmel,rockchip,zte} seem to have the
exact same pattern: the Kconfig files enable the drivers, but the
Makefile in drivers/soc doesn't actually traverse into the
subdirectories.

End result: CONFIG_COMPILE_TEST doesn't actually do any compile
testing for those drivers.

I did not try to fix all of those things up, because I didn't do the
driver movements there.

  Linus


Re: [GIT PULL 1/4] ARM: SoC platform updates

2019-05-16 Thread Arnd Bergmann
On Thu, May 16, 2019 at 5:34 PM Linus Torvalds
 wrote:
>
> On Wed, May 15, 2019 at 11:43 PM Olof Johansson  wrote:
> >
> > SoC updates, mostly refactorings and cleanups of old legacy platforms.
> > Major themes this release:
>
> Hmm. This brings in a new warning:
>
>   drivers/clocksource/timer-ixp4xx.c:78:20: warning:
> ‘ixp4xx_read_sched_clock’ defined but not used [-Wunused-function]
>
> because that drivers is enabled for build testing, but that function
> is only used under
>
>   #ifdef CONFIG_ARM
> sched_clock_register(ixp4xx_read_sched_clock, 32, timer_freq);
>   #endif
>
> It's not clear why that #ifdef is there. This driver only builds
> non-ARM when COMPILE_TEST is enabled, and that #ifdef actually breaks
> that build test.
>
> I'm going to remove that #ifdef in my merge, because I do *not* want
> to see new warnings, and it doesn't seem to make any sense.
>
> Maybe that's the wrong resolution, please holler and let me know if
> you want something else.

As far as I can tell, that is the best fix, thanks for the cleanup!

  Arnd


Re: [PATCH v1 1/2] pid: add pidfd_open()

2019-05-16 Thread Oleg Nesterov
On 05/17, Aleksa Sarai wrote:
>
> On 2019-05-16, Oleg Nesterov  wrote:
> > On 05/17, Aleksa Sarai wrote:
> > > On 2019-05-16, Oleg Nesterov  wrote:
> > > > On 05/16, Christian Brauner wrote:
> > > > > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > > > > created pidfds at process creation time.
> > > >
> > > > Now I am wondering why do we need CLONE_PIDFD, you can just do
> > > >
> > > > pid = fork();
> > > > pidfd_open(pid);
> > >
> > > While the race window would be exceptionally short, there is the
> > > possibility that the child will die
> >
> > Yes,
> >
> > > and their pid will be recycled
> > > before you do pidfd_open().
> >
> > No.
> >
> > Unless the caller's sub-thread does wait() before pidfd_open(), of course.
> > Or unless you do signal(SIGCHILD, SIG_IGN).
>
> What about CLONE_PARENT?

I should have mentioned CLONE_PARENT ;)

Of course in this case the child can be reaped before pidfd_open(). But how 
often
do you or other people use clone(CLONE_PARENT) ? not to mention you can 
trivially
eliminate/detect this race if you really need this.

Don't get me wrong, I am not trying to say that CLONE_PIDFD is a bad idea.

But to me pidfd_open() is much more useful. Say, as a perl programmer I can 
easily
use pidfd_open(), but not CLONE_PIDFD.

Oleg.



Re: [PATCH v1 1/2] pid: add pidfd_open()

2019-05-16 Thread Aleksa Sarai
On 2019-05-16, Oleg Nesterov  wrote:
> On 05/17, Aleksa Sarai wrote:
> > On 2019-05-16, Oleg Nesterov  wrote:
> > > On 05/16, Christian Brauner wrote:
> > > > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > > > created pidfds at process creation time.
> > >
> > > Now I am wondering why do we need CLONE_PIDFD, you can just do
> > >
> > >   pid = fork();
> > >   pidfd_open(pid);
> >
> > While the race window would be exceptionally short, there is the
> > possibility that the child will die
> 
> Yes,
> 
> > and their pid will be recycled
> > before you do pidfd_open().
> 
> No.
> 
> Unless the caller's sub-thread does wait() before pidfd_open(), of course.
> Or unless you do signal(SIGCHILD, SIG_IGN).

What about CLONE_PARENT?

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


Re: [PATCH v1 1/2] pid: add pidfd_open()

2019-05-16 Thread Oleg Nesterov
On 05/17, Aleksa Sarai wrote:
>
> On 2019-05-16, Oleg Nesterov  wrote:
> > On 05/16, Christian Brauner wrote:
> > >
> > > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > > created pidfds at process creation time.
> >
> > Now I am wondering why do we need CLONE_PIDFD, you can just do
> >
> > pid = fork();
> > pidfd_open(pid);
>
> While the race window would be exceptionally short, there is the
> possibility that the child will die

Yes,

> and their pid will be recycled
> before you do pidfd_open().

No.

Unless the caller's sub-thread does wait() before pidfd_open(), of course.
Or unless you do signal(SIGCHILD, SIG_IGN).

Oleg.



Re: [PATCH v1 1/2] pid: add pidfd_open()

2019-05-16 Thread Christian Brauner
On Thu, May 16, 2019 at 04:56:08PM +0200, Geert Uytterhoeven wrote:
> Hi Christian, David,
> 
> On Thu, May 16, 2019 at 4:00 PM Christian Brauner  
> wrote:
> > This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
> > pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
> > process that is created via traditional fork()/clone() calls that is only
> > referenced by a PID:
> >
> > int pidfd = pidfd_open(1234, 0);
> > ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);
> >
> > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > created pidfds at process creation time.
> > However, a lot of processes get created with traditional PID-based calls
> > such as fork() or clone() (without CLONE_PIDFD). For these processes a
> > caller can currently not create a pollable pidfd. This is a huge problem
> > for Android's low memory killer (LMK) and service managers such as systemd.
> > Both are examples of tools that want to make use of pidfds to get reliable
> > notification of process exit for non-parents (pidfd polling) and race-free
> > signal sending (pidfd_send_signal()). They intend to switch to this API for
> > process supervision/management as soon as possible. Having no way to get
> > pollable pidfds from PID-only processes is one of the biggest blockers for
> > them in adopting this api. With pidfd_open() making it possible to retrieve
> > pidfd for PID-based processes we enable them to adopt this api.
> >
> > In line with Arnd's recent changes to consolidate syscall numbers across
> > architectures, I have added the pidfd_open() syscall to all architectures
> > at the same time.
> 
> > +428common  pidfd_open  sys_pidfd_open
> 
> This number conflicts with "[PATCH 4/4] uapi: Wire up the mount API
> syscalls on non-x86 arches", which is requested to be included before
> rc1.

Yep, already spotted this thanks to Arnd! Will change the syscall
numbers.

Thanks!
Christian

> 
> Note that none of this is part of linux-next.
> 
> Gr{oetje,eeting}s,
> 
> Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds


Re: [PATCH v1 1/2] pid: add pidfd_open()

2019-05-16 Thread Christian Brauner
On Thu, May 16, 2019 at 04:27:00PM +0200, Oleg Nesterov wrote:
> On 05/16, Christian Brauner wrote:
> >
> > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > created pidfds at process creation time.
> 
> Now I am wondering why do we need CLONE_PIDFD, you can just do
> 
>   pid = fork();
>   pidfd_open(pid);

CLONE_PIDFD eliminates the race at the source and let's us avoid two
syscalls for the sake of one. That'll obviously matter even more when we
enable CLONE_THREAD | CLONE_PIDFD.
pidfd_open() is really just a necessity for anyone who does non-parent
process management aka LMK or service managers.
I also would like to reserve the ability at some point (e.g. with cloneX
or sm) to be able to specify specific additional flags at process
creation time that modify pidfd behavior.

> 
> > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > +{
> > +   int fd, ret;
> > +   struct pid *p;
> > +   struct task_struct *tsk;
> > +
> > +   if (flags)
> > +   return -EINVAL;
> > +
> > +   if (pid <= 0)
> > +   return -EINVAL;
> > +
> > +   p = find_get_pid(pid);
> > +   if (!p)
> > +   return -ESRCH;
> > +
> > +   ret = 0;
> > +   rcu_read_lock();
> > +   /*
> > +* If this returns non-NULL the pid was used as a thread-group
> > +* leader. Note, we race with exec here: If it changes the
> > +* thread-group leader we might return the old leader.
> > +*/
> > +   tsk = pid_task(p, PIDTYPE_TGID);
> > +   if (!tsk)
> > +   ret = -ESRCH;
> > +   rcu_read_unlock();
> > +
> > +   fd = ret ?: pidfd_create(p);
> > +   put_pid(p);
> > +   return fd;
> > +}
> 
> Looks correct, feel free to add Reviewed-by: Oleg Nesterov 
> 
> But why do we need task_struct *tsk?
> 
>   rcu_read_lock();
>   if (!pid_task(PIDTYPE_TGID))
>   ret = -ESRCH;
>   rcu_read_unlock();

Sure, that's simpler. I'll rework and add your Reviewed-by.

> 
> and in fact we do not even need rcu_read_lock(), we could do
> 
>   // shut up rcu_dereference_check()
>   rcu_lock_acquire(_lock_map);
>   if (!pid_task(PIDTYPE_TGID))
>   ret = -ESRCH;
>   rcu_lock_release(_lock_map);
> 
> Well... I won't insist, but the comment about the race with exec looks a bit
> confusing to me. It is true, but we do not care at all, we are not going to
> use the task_struct returned by pid_task().

Yeah, I can remove it.

Thanks!
Christian


Re: [PATCH v1 1/2] pid: add pidfd_open()

2019-05-16 Thread Aleksa Sarai
On 2019-05-16, Oleg Nesterov  wrote:
> On 05/16, Christian Brauner wrote:
> >
> > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > created pidfds at process creation time.
> 
> Now I am wondering why do we need CLONE_PIDFD, you can just do
> 
>   pid = fork();
>   pidfd_open(pid);

While the race window would be exceptionally short, there is the
possibility that the child will die and their pid will be recycled
before you do pidfd_open(). CLONE_PIDFD removes the race completely.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


Re: [PATCH v1 1/2] pid: add pidfd_open()

2019-05-16 Thread Geert Uytterhoeven
Hi Christian, David,

On Thu, May 16, 2019 at 4:00 PM Christian Brauner  wrote:
> This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
> pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
> process that is created via traditional fork()/clone() calls that is only
> referenced by a PID:
>
> int pidfd = pidfd_open(1234, 0);
> ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);
>
> With the introduction of pidfds through CLONE_PIDFD it is possible to
> created pidfds at process creation time.
> However, a lot of processes get created with traditional PID-based calls
> such as fork() or clone() (without CLONE_PIDFD). For these processes a
> caller can currently not create a pollable pidfd. This is a huge problem
> for Android's low memory killer (LMK) and service managers such as systemd.
> Both are examples of tools that want to make use of pidfds to get reliable
> notification of process exit for non-parents (pidfd polling) and race-free
> signal sending (pidfd_send_signal()). They intend to switch to this API for
> process supervision/management as soon as possible. Having no way to get
> pollable pidfds from PID-only processes is one of the biggest blockers for
> them in adopting this api. With pidfd_open() making it possible to retrieve
> pidfd for PID-based processes we enable them to adopt this api.
>
> In line with Arnd's recent changes to consolidate syscall numbers across
> architectures, I have added the pidfd_open() syscall to all architectures
> at the same time.

> +428common  pidfd_open  sys_pidfd_open

This number conflicts with "[PATCH 4/4] uapi: Wire up the mount API
syscalls on non-x86 arches", which is requested to be included before
rc1.

Note that none of this is part of linux-next.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-16 Thread Aleksa Sarai
On 2019-05-16, Christian Brauner  wrote:
> On Wed, May 15, 2019 at 10:45:06AM -0700, Daniel Colascione wrote:
> > On Wed, May 15, 2019 at 3:04 AM Christian Brauner  
> > wrote:
> > > +   if (pid <= 0)
> > > +   return -EINVAL;
> > 
> > WDYT of defining pid == 0 to mean "open myself"?
> 
> I'm torn. It be a nice shortcut of course but pid being 0 is usually an
> indicator for child processes. So unless the getpid() before
> pidfd_open() is an issue I'd say let's leave it as is. If you really
> want the shortcut might -1 be better?

I'd suggest not using negative numbers, and instead reserving them for
PIDTYPE_TGID if we ever want to have that in the future. IMHO, doing

  pfd = pidfd_open(getpid(), 0);

is not the end of the world.


-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-16 Thread Christian Brauner
On Thu, May 16, 2019 at 04:03:27PM +0200, Jann Horn wrote:
> On Thu, May 16, 2019 at 3:08 PM Christian Brauner  
> wrote:
> > On Wed, May 15, 2019 at 10:45:06AM -0700, Daniel Colascione wrote:
> > > On Wed, May 15, 2019 at 3:04 AM Christian Brauner  
> > > wrote:
> > > >
> > > > This adds the pidfd_open() syscall. It allows a caller to retrieve 
> > > > pollable
> > > > pidfds for a process which did not get created via CLONE_PIDFD, i.e. 
> > > > for a
> > > > process that is created via traditional fork()/clone() calls that is 
> > > > only
> > > > referenced by a PID:
> [...]
> > > > +/**
> > > > + * pidfd_open() - Open new pid file descriptor.
> > > > + *
> > > > + * @pid:   pid for which to retrieve a pidfd
> > > > + * @flags: flags to pass
> > > > + *
> > > > + * This creates a new pid file descriptor with the O_CLOEXEC flag set 
> > > > for
> > > > + * the process identified by @pid. Currently, the process identified by
> > > > + * @pid must be a thread-group leader. This restriction currently 
> > > > exists
> > > > + * for all aspects of pidfds including pidfd creation (CLONE_PIDFD 
> > > > cannot
> > > > + * be used with CLONE_THREAD) and pidfd polling (only supports thread 
> > > > group
> > > > + * leaders).
> > > > + *
> > > > + * Return: On success, a cloexec pidfd is returned.
> > > > + * On error, a negative errno number will be returned.
> > > > + */
> > > > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > > > +{
> [...]
> > > > +   if (pid <= 0)
> > > > +   return -EINVAL;
> > >
> > > WDYT of defining pid == 0 to mean "open myself"?
> >
> > I'm torn. It be a nice shortcut of course but pid being 0 is usually an
> > indicator for child processes. So unless the getpid() before
> > pidfd_open() is an issue I'd say let's leave it as is. If you really
> > want the shortcut might -1 be better?
> 
> Joining the bikeshed painting club: Please don't allow either 0 or -1
> as shortcut for "self". James Forshaw found an Android security bug a
> while back (https://bugs.chromium.org/p/project-zero/issues/detail?id=727)
> that passed a PID to getpidcon(), except that the PID was 0
> (placeholder for oneway binder transactions), and then the service
> thought it was talking to itself. You could pick some other number and
> provide a #define for that, but I think pidfd_open(getpid(), ...)
> makes more sense.

Yes, I agree. I left it as is for v1, i.e. no shortcut; getpid() should
do.

Christian


Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-16 Thread Jann Horn
On Thu, May 16, 2019 at 3:08 PM Christian Brauner  wrote:
> On Wed, May 15, 2019 at 10:45:06AM -0700, Daniel Colascione wrote:
> > On Wed, May 15, 2019 at 3:04 AM Christian Brauner  
> > wrote:
> > >
> > > This adds the pidfd_open() syscall. It allows a caller to retrieve 
> > > pollable
> > > pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
> > > process that is created via traditional fork()/clone() calls that is only
> > > referenced by a PID:
[...]
> > > +/**
> > > + * pidfd_open() - Open new pid file descriptor.
> > > + *
> > > + * @pid:   pid for which to retrieve a pidfd
> > > + * @flags: flags to pass
> > > + *
> > > + * This creates a new pid file descriptor with the O_CLOEXEC flag set for
> > > + * the process identified by @pid. Currently, the process identified by
> > > + * @pid must be a thread-group leader. This restriction currently exists
> > > + * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
> > > + * be used with CLONE_THREAD) and pidfd polling (only supports thread 
> > > group
> > > + * leaders).
> > > + *
> > > + * Return: On success, a cloexec pidfd is returned.
> > > + * On error, a negative errno number will be returned.
> > > + */
> > > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > > +{
[...]
> > > +   if (pid <= 0)
> > > +   return -EINVAL;
> >
> > WDYT of defining pid == 0 to mean "open myself"?
>
> I'm torn. It be a nice shortcut of course but pid being 0 is usually an
> indicator for child processes. So unless the getpid() before
> pidfd_open() is an issue I'd say let's leave it as is. If you really
> want the shortcut might -1 be better?

Joining the bikeshed painting club: Please don't allow either 0 or -1
as shortcut for "self". James Forshaw found an Android security bug a
while back (https://bugs.chromium.org/p/project-zero/issues/detail?id=727)
that passed a PID to getpidcon(), except that the PID was 0
(placeholder for oneway binder transactions), and then the service
thought it was talking to itself. You could pick some other number and
provide a #define for that, but I think pidfd_open(getpid(), ...)
makes more sense.


[PATCH v1 2/2] tests: add pidfd_open() tests

2019-05-16 Thread Christian Brauner
This adds testing for the new pidfd_open() syscalls. Specifically, we test:
- that no invalid flags can be passed to pidfd_open()
- that no invalid pid can be passed to pidfd_open()
- that a pidfd can be retrieved with pidfd_open()
- that the retrieved pidfd references the correct pid

Signed-off-by: Christian Brauner 
Cc: Arnd Bergmann 
Cc: "Eric W. Biederman" 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: Jann Horn 
Cc: David Howells 
Cc: "Michael Kerrisk (man-pages)" 
Cc: Andy Lutomirsky 
Cc: Andrew Morton 
Cc: Oleg Nesterov 
Cc: Aleksa Sarai 
Cc: Linus Torvalds 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
---
v1: unchanged
---
 tools/testing/selftests/pidfd/Makefile|   2 +-
 tools/testing/selftests/pidfd/pidfd.h |  57 ++
 .../testing/selftests/pidfd/pidfd_open_test.c | 170 ++
 tools/testing/selftests/pidfd/pidfd_test.c|  41 +
 4 files changed, 229 insertions(+), 41 deletions(-)
 create mode 100644 tools/testing/selftests/pidfd/pidfd.h
 create mode 100644 tools/testing/selftests/pidfd/pidfd_open_test.c

diff --git a/tools/testing/selftests/pidfd/Makefile 
b/tools/testing/selftests/pidfd/Makefile
index deaf8073bc06..b36c0be70848 100644
--- a/tools/testing/selftests/pidfd/Makefile
+++ b/tools/testing/selftests/pidfd/Makefile
@@ -1,6 +1,6 @@
 CFLAGS += -g -I../../../../usr/include/
 
-TEST_GEN_PROGS := pidfd_test
+TEST_GEN_PROGS := pidfd_test pidfd_open_test
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/pidfd/pidfd.h 
b/tools/testing/selftests/pidfd/pidfd.h
new file mode 100644
index ..8452e910463f
--- /dev/null
+++ b/tools/testing/selftests/pidfd/pidfd.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __PIDFD_H
+#define __PIDFD_H
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../kselftest.h"
+
+/*
+ * The kernel reserves 300 pids via RESERVED_PIDS in kernel/pid.c
+ * That means, when it wraps around any pid < 300 will be skipped.
+ * So we need to use a pid > 300 in order to test recycling.
+ */
+#define PID_RECYCLE 1000
+
+/*
+ * Define a few custom error codes for the child process to clearly indicate
+ * what is happening. This way we can tell the difference between a system
+ * error, a test error, etc.
+ */
+#define PIDFD_PASS 0
+#define PIDFD_FAIL 1
+#define PIDFD_ERROR 2
+#define PIDFD_SKIP 3
+#define PIDFD_XFAIL 4
+
+int wait_for_pid(pid_t pid)
+{
+   int status, ret;
+
+again:
+   ret = waitpid(pid, , 0);
+   if (ret == -1) {
+   if (errno == EINTR)
+   goto again;
+
+   return -1;
+   }
+
+   if (!WIFEXITED(status))
+   return -1;
+
+   return WEXITSTATUS(status);
+}
+
+
+#endif /* __PIDFD_H */
diff --git a/tools/testing/selftests/pidfd/pidfd_open_test.c 
b/tools/testing/selftests/pidfd/pidfd_open_test.c
new file mode 100644
index ..9b073c1ac618
--- /dev/null
+++ b/tools/testing/selftests/pidfd/pidfd_open_test.c
@@ -0,0 +1,170 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pidfd.h"
+#include "../kselftest.h"
+
+static inline int sys_pidfd_open(pid_t pid, unsigned int flags)
+{
+   return syscall(__NR_pidfd_open, pid, flags);
+}
+
+static int safe_int(const char *numstr, int *converted)
+{
+   char *err = NULL;
+   long sli;
+
+   errno = 0;
+   sli = strtol(numstr, , 0);
+   if (errno == ERANGE && (sli == LONG_MAX || sli == LONG_MIN))
+   return -ERANGE;
+
+   if (errno != 0 && sli == 0)
+   return -EINVAL;
+
+   if (err == numstr || *err != '\0')
+   return -EINVAL;
+
+   if (sli > INT_MAX || sli < INT_MIN)
+   return -ERANGE;
+
+   *converted = (int)sli;
+   return 0;
+}
+
+static int char_left_gc(const char *buffer, size_t len)
+{
+   size_t i;
+
+   for (i = 0; i < len; i++) {
+   if (buffer[i] == ' ' ||
+   buffer[i] == '\t')
+   continue;
+
+   return i;
+   }
+
+   return 0;
+}
+
+static int char_right_gc(const char *buffer, size_t len)
+{
+   int i;
+
+   for (i = len - 1; i >= 0; i--) {
+   if (buffer[i] == ' '  ||
+   buffer[i] == '\t' ||
+   buffer[i] == '\n' ||
+   buffer[i] == '\0')
+   continue;
+
+   return i + 1;
+   }
+
+   return 0;
+}
+
+static char *trim_whitespace_in_place(char *buffer)
+{
+   buffer += char_left_gc(buffer, strlen(buffer));
+   buffer[char_right_gc(buffer, strlen(buffer))] = '\0';
+   return buffer;
+}
+
+static pid_t get_pid_from_fdinfo_file(int pidfd, const char *key, size_t 
keylen)
+{
+   int ret;

[PATCH v1 1/2] pid: add pidfd_open()

2019-05-16 Thread Christian Brauner
This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
process that is created via traditional fork()/clone() calls that is only
referenced by a PID:

int pidfd = pidfd_open(1234, 0);
ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);

With the introduction of pidfds through CLONE_PIDFD it is possible to
created pidfds at process creation time.
However, a lot of processes get created with traditional PID-based calls
such as fork() or clone() (without CLONE_PIDFD). For these processes a
caller can currently not create a pollable pidfd. This is a huge problem
for Android's low memory killer (LMK) and service managers such as systemd.
Both are examples of tools that want to make use of pidfds to get reliable
notification of process exit for non-parents (pidfd polling) and race-free
signal sending (pidfd_send_signal()). They intend to switch to this API for
process supervision/management as soon as possible. Having no way to get
pollable pidfds from PID-only processes is one of the biggest blockers for
them in adopting this api. With pidfd_open() making it possible to retrieve
pidfd for PID-based processes we enable them to adopt this api.

In line with Arnd's recent changes to consolidate syscall numbers across
architectures, I have added the pidfd_open() syscall to all architectures
at the same time.

Signed-off-by: Christian Brauner 
Acked-by: Geert Uytterhoeven 
Cc: Arnd Bergmann 
Cc: "Eric W. Biederman" 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: Jann Horn 
Cc: David Howells 
Cc: Andy Lutomirsky 
Cc: Andrew Morton 
Cc: Oleg Nesterov 
Cc: Aleksa Sarai 
Cc: Linus Torvalds 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
---
v1:
- kbuild test robot :
  - add missing entry for pidfd_open to arch/arm/tools/syscall.tbl
- Oleg Nesterov :
  - use simpler thread-group leader check
---
 arch/alpha/kernel/syscalls/syscall.tbl  |  1 +
 arch/arm/tools/syscall.tbl  |  1 +
 arch/arm64/include/asm/unistd32.h   |  2 +
 arch/ia64/kernel/syscalls/syscall.tbl   |  1 +
 arch/m68k/kernel/syscalls/syscall.tbl   |  1 +
 arch/microblaze/kernel/syscalls/syscall.tbl |  1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +
 arch/parisc/kernel/syscalls/syscall.tbl |  1 +
 arch/powerpc/kernel/syscalls/syscall.tbl|  1 +
 arch/s390/kernel/syscalls/syscall.tbl   |  1 +
 arch/sh/kernel/syscalls/syscall.tbl |  1 +
 arch/sparc/kernel/syscalls/syscall.tbl  |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl  |  1 +
 arch/x86/entry/syscalls/syscall_64.tbl  |  1 +
 arch/xtensa/kernel/syscalls/syscall.tbl |  1 +
 include/linux/pid.h |  1 +
 include/linux/syscalls.h|  1 +
 include/uapi/asm-generic/unistd.h   |  4 +-
 kernel/fork.c   |  2 +-
 kernel/pid.c| 50 +
 20 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl 
b/arch/alpha/kernel/syscalls/syscall.tbl
index 165f268beafc..ddc3c93ad7a7 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -467,3 +467,4 @@
 535common  io_uring_setup  sys_io_uring_setup
 536common  io_uring_enter  sys_io_uring_enter
 537common  io_uring_register   sys_io_uring_register
+538common  pidfd_open  sys_pidfd_open
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 0393917eaa57..fc41fb34a636 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -441,3 +441,4 @@
 425common  io_uring_setup  sys_io_uring_setup
 426common  io_uring_enter  sys_io_uring_enter
 427common  io_uring_register   sys_io_uring_register
+428common  pidfd_open  sys_pidfd_open
diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 23f1a44acada..350e2049b4a9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -874,6 +874,8 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup)
 __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter)
 #define __NR_io_uring_register 427
 __SYSCALL(__NR_io_uring_register, sys_io_uring_register)
+#define __NR_pidfd_open 428
+__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl 
b/arch/ia64/kernel/syscalls/syscall.tbl
index 56e3d0b685e1..7115f6dd347a 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -348,3 +348,4 @@
 425common  io_uring_setup  sys_io_uring_setup
 426common  io_uring_enter  sys_io_uring_enter
 427common  io_uring_register   sys_io_uring_register
+428   

Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-16 Thread Christian Brauner
On Wed, May 15, 2019 at 10:45:06AM -0700, Daniel Colascione wrote:
> On Wed, May 15, 2019 at 3:04 AM Christian Brauner  
> wrote:
> >
> > This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
> > pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
> > process that is created via traditional fork()/clone() calls that is only
> > referenced by a PID:
> 
> Thanks for doing this work. I'm really looking forward to this new
> approach to process management.

Thanks! Glad to hear!

> 
> > int pidfd = pidfd_open(1234, 0);
> > ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);
> >
> > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > created pidfds at process creation time.
> > However, a lot of processes get created with traditional PID-based calls
> > such as fork() or clone() (without CLONE_PIDFD). For these processes a
> > caller can currently not create a pollable pidfd. This is a huge problem
> > for Android's low memory killer (LMK) and service managers such as systemd.
> > Both are examples of tools that want to make use of pidfds to get reliable
> > notification of process exit for non-parents (pidfd polling) and race-free
> > signal sending (pidfd_send_signal()). They intend to switch to this API for
> > process supervision/management as soon as possible. Having no way to get
> > pollable pidfds from PID-only processes is one of the biggest blockers for
> > them in adopting this api. With pidfd_open() making it possible to retrieve
> > pidfd for PID-based processes we enable them to adopt this api.
> >
> > In line with Arnd's recent changes to consolidate syscall numbers across
> > architectures, I have added the pidfd_open() syscall to all architectures
> > at the same time.
> 
> I'm glad it's easier now.
> 
> >  arch/alpha/kernel/syscalls/syscall.tbl  |  1 +
> >  arch/arm64/include/asm/unistd32.h   |  2 +
> >  arch/ia64/kernel/syscalls/syscall.tbl   |  1 +
> >  arch/m68k/kernel/syscalls/syscall.tbl   |  1 +
> >  arch/microblaze/kernel/syscalls/syscall.tbl |  1 +
> >  arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +
> >  arch/parisc/kernel/syscalls/syscall.tbl |  1 +
> >  arch/powerpc/kernel/syscalls/syscall.tbl|  1 +
> >  arch/s390/kernel/syscalls/syscall.tbl   |  1 +
> >  arch/sh/kernel/syscalls/syscall.tbl |  1 +
> >  arch/sparc/kernel/syscalls/syscall.tbl  |  1 +
> >  arch/x86/entry/syscalls/syscall_32.tbl  |  1 +
> >  arch/x86/entry/syscalls/syscall_64.tbl  |  1 +
> >  arch/xtensa/kernel/syscalls/syscall.tbl |  1 +
> 
> It'd be nice to arrange the system call tables so that we need to
> change only one file when adding a new system call.
> 
> [Snip system call wiring]
> 
> > --- a/include/linux/pid.h
> > +++ b/include/linux/pid.h
> > @@ -67,6 +67,7 @@ struct pid
> >  extern struct pid init_struct_pid;
> >
> >  extern const struct file_operations pidfd_fops;
> > +extern int pidfd_create(struct pid *pid);
> >
> >  static inline struct pid *get_pid(struct pid *pid)
> >  {
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > index e2870fe1be5b..989055e0b501 100644
> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> > @@ -929,6 +929,7 @@ asmlinkage long sys_clock_adjtime32(clockid_t 
> > which_clock,
> > struct old_timex32 __user *tx);
> >  asmlinkage long sys_syncfs(int fd);
> >  asmlinkage long sys_setns(int fd, int nstype);
> > +asmlinkage long sys_pidfd_open(pid_t pid, unsigned int flags);
> >  asmlinkage long sys_sendmmsg(int fd, struct mmsghdr __user *msg,
> >  unsigned int vlen, unsigned flags);
> >  asmlinkage long sys_process_vm_readv(pid_t pid,
> > diff --git a/include/uapi/asm-generic/unistd.h 
> > b/include/uapi/asm-generic/unistd.h
> > index dee7292e1df6..94a257a93d20 100644
> > --- a/include/uapi/asm-generic/unistd.h
> > +++ b/include/uapi/asm-generic/unistd.h
> > @@ -832,9 +832,11 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup)
> >  __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter)
> >  #define __NR_io_uring_register 427
> >  __SYSCALL(__NR_io_uring_register, sys_io_uring_register)
> > +#define __NR_pidfd_open 428
> > +__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
> >
> >  #undef __NR_syscalls
> > -#define __NR_syscalls 428
> > +#define __NR_syscalls 429
> >
> >  /*
> >   * 32 bit systems traditionally used different
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 737db1828437..980cc1d2b8d4 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -1714,7 +1714,7 @@ const struct file_operations pidfd_fops = {
> >   * Return: On success, a cloexec pidfd is returned.
> >   * On error, a negative errno number will be returned.
> >   */
> > -static int pidfd_create(struct pid *pid)
> > +int pidfd_create(struct pid *pid)
> >  {
> > int fd;
> >
> > diff --git a/kernel/pid.c b/kernel/pid.c
> > index 20881598bdfa..237d18d6ecb8 100644
> > ---