Re: [PATCH v2 0/2] close_range()

2019-05-24 Thread Al Viro
On Sat, May 25, 2019 at 12:27:40AM +0300, Alexey Dobriyan wrote:

> What about orthogonality of interfaces?
> 
>   fdmap()
>   bulk_close()
> 
> Now fdmap() can be reused for lsof/criu and it is only 2 system calls
> for close-everything usecase which is OK because readdir is 4(!) minimum:
> 
>   open
>   getdents
>   getdents() = 0
>   close
> 
> Writing all of this I understood how fdmap can be made more faster which
> neither getdents() nor even read() have the luxury of: it can return
> a flag if more data is available so that application would do next fdmap()
> only if truly necessary.

Tactless question: what has traumatised you so badly about string operations?
Because that seems to be the common denominator to a lot of things...


Re: [PATCH v2 0/2] close_range()

2019-05-24 Thread Alexey Dobriyan
On Fri, May 24, 2019 at 11:55:44AM -0700, Linus Torvalds wrote:
> On Fri, May 24, 2019 at 11:39 AM Alexey Dobriyan  wrote:
> >
> > > Would there ever be any other reason to traverse unknown open files
> > > than to close them?
> >
> > This is what lsof(1) does:
> 
> I repeat: Would there ever be any other reason to traverse unknown
> open files than to close them?
> 
> lsof is not AT ALL a relevant argument.
> 
> lsof fundamentally wants /proc, because lsof looks at *other*
> processes. That has absolutely zero to do with fdmap. lsof does *not*
> want fdmap at all. It wants "list other processes files". Which is
> very much what /proc is all about.
> 
> So your argument that "fdmap is more generic" is bogus.
> 
> fdmap is entirely pointless unless you can show a real and relevant
> (to performance) use of it.
> 
> When you would *possibly* have a "let me get a list of all the file
> descriptors I have open, because I didn't track them myself"
> situation?  That makes no sense. Particularly from a performance
> standpoint.
> 
> In contrast, "close_range()" makes sense as an operation.

What about orthogonality of interfaces?

fdmap()
bulk_close()

Now fdmap() can be reused for lsof/criu and it is only 2 system calls
for close-everything usecase which is OK because readdir is 4(!) minimum:

open
getdents
getdents() = 0
close

Writing all of this I understood how fdmap can be made more faster which
neither getdents() nor even read() have the luxury of: it can return
a flag if more data is available so that application would do next fdmap()
only if truly necessary.

> I can
> explain exactly when it would be used, and I can easily see a
> situation where "I've opened a ton of files, now I want to release
> them" is a valid model of operation. And it's a valid optimization to
> do a bulk operation like that.
> 
> IOW, close_range() makes sense as an operation even if you could just
> say "ok, I know exactly what files I have open". But it also makes
> sense as an operation for the case of "I don't even care what files I
> have open, I just want to close them".
> 
> In contrast, the "I have opened a ton of files, and I don't even know
> what the hell I did, so can you list them for me" makes no sense.
> 
> Because outside of "close them", there's no bulk operation that makes
> sense on random file handles that you don't know what they are. Unless
> you iterate over them and do the stat thing or whatever to figure it
> out - which is lsof, but as mentioned, it's about *other* peoples
> files.

What you're doing is making exactly one usecase take exactly one system
call and leaving everything else deal with /proc. Stracing lsof shows
very clearly how stupid and how wasteful it is. Especially now that
we're post-meltdown era caring about system call costs (yeah suure).

I'm suggesting make close-universe usecase take only 2 system calls.
which is still better than anything /proc can offer.


Re: [PATCH v2 0/2] close_range()

2019-05-24 Thread Linus Torvalds
On Fri, May 24, 2019 at 11:39 AM Alexey Dobriyan  wrote:
>
> > Would there ever be any other reason to traverse unknown open files
> > than to close them?
>
> This is what lsof(1) does:

I repeat: Would there ever be any other reason to traverse unknown
open files than to close them?

lsof is not AT ALL a relevant argument.

lsof fundamentally wants /proc, because lsof looks at *other*
processes. That has absolutely zero to do with fdmap. lsof does *not*
want fdmap at all. It wants "list other processes files". Which is
very much what /proc is all about.

So your argument that "fdmap is more generic" is bogus.

fdmap is entirely pointless unless you can show a real and relevant
(to performance) use of it.

When you would *possibly* have a "let me get a list of all the file
descriptors I have open, because I didn't track them myself"
situation?  That makes no sense. Particularly from a performance
standpoint.

In contrast, "close_range()" makes sense as an operation. I can
explain exactly when it would be used, and I can easily see a
situation where "I've opened a ton of files, now I want to release
them" is a valid model of operation. And it's a valid optimization to
do a bulk operation like that.

IOW, close_range() makes sense as an operation even if you could just
say "ok, I know exactly what files I have open". But it also makes
sense as an operation for the case of "I don't even care what files I
have open, I just want to close them".

In contrast, the "I have opened a ton of files, and I don't even know
what the hell I did, so can you list them for me" makes no sense.

Because outside of "close them", there's no bulk operation that makes
sense on random file handles that you don't know what they are. Unless
you iterate over them and do the stat thing or whatever to figure it
out - which is lsof, but as mentioned, it's about *other* peoples
files.

   Linus


Re: [PATCH v2 0/2] close_range()

2019-05-24 Thread Alexey Dobriyan
On Thu, May 23, 2019 at 02:34:31PM -0700, Linus Torvalds wrote:
> On Thu, May 23, 2019 at 11:22 AM Alexey Dobriyan  wrote:
> >
> > > This is v2 of this patchset.
> >
> > We've sent fdmap(2) back in the day:
> 
> Well, if the main point of the exercise is performance, then fdmap()
> is clearly inferior.

This is not true because there are other usecases.

Current equivalent is readdir() where getdents is essentially bulk fdmap()
with pretty-printing. glibc does getdents into 32KB buffer.

There was a bulk taskstats patch long before meltdown fiasco.

Unfortunately closerange() only closes ranges.
This is why I didn't even tried to send closefrom(2) from OpenBSD.

> Sadly, with all the HW security mitigation, system calls are no longer cheap.
> 
> Would there ever be any other reason to traverse unknown open files
> than to close them?

This is what lsof(1) does:

3140  openat(AT_FDCWD, "/proc/29499/fd", 
O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
3140  fstat(4, {st_mode=S_IFDIR|0500, st_size=0, ...}) = 0
3140  getdents(4, /* 6 entries */, 32768) = 144
3140  readlink("/proc/29499/fd/0", "/dev/pts/4", 4096) = 10
3140  lstat("/proc/29499/fd/0", {st_mode=S_IFLNK|0700, st_size=64, ...}) = 0
3140  stat("/proc/29499/fd/0", {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 4), 
...}) = 0
3140  openat(AT_FDCWD, "/proc/29499/fdinfo/0", O_RDONLY) = 7
3140  fstat(7, {st_mode=S_IFREG|0400, st_size=0, ...}) = 0
3140  read(7, "pos:\t0\nflags:\t02002\nmnt_id:\t24\n", 1024) = 31
3140  read(7, "", 1024) = 0
3140  close(7)
...

Once fdmap(2) or equivalent is in, more bulk system calls operating on
descriptors can pop up. But closefrom() will remain closefrom().


Re: [PATCH v2 0/2] close_range()

2019-05-24 Thread Christian Brauner
On Thu, May 23, 2019 at 02:34:31PM -0700, Linus Torvalds wrote:
> On Thu, May 23, 2019 at 11:22 AM Alexey Dobriyan  wrote:
> >
> > > This is v2 of this patchset.
> >
> > We've sent fdmap(2) back in the day:
> 
> Well, if the main point of the exercise is performance, then fdmap()
> is clearly inferior.
> 
> Sadly, with all the HW security mitigation, system calls are no longer cheap.
> 
> Would there ever be any other reason to traverse unknown open files
> than to close them?

I have had lively discussions and interestingly worded mails on account
of all of this. But noone has brought up this scenario. Florian also
said that it's not needed [1].

If we really want something like that we don't really need a new syscall
I think. We can just do a prctl() command or fcntl() command that will
give you back the next open fd.

There's imho crazy ideas out there what people expect a multi-close file
descriptor solution to look like. Service manager people apparently
think it would be a great idea to have a syscall that takes an array of
fds which the kernel is supposed to leave open and close all others,
basically "close all of the fds only leave out those I tell you".
I think for such a use-cases they can push for a prctl(PR_GET_NEXTFD, 2)
or a fcntl(2, F_GET_NEXTFD) and implement that in userspace.

I really only care about having a performant solution to closing a range
of fds that's a little more flexible than closefrom() without going all
crazy generic and copying (possibly) large bits of data between kernel-
and userspace.

close_range() is really something I've picked up on the side because the
current state has bothered me (and others) a long time whenever I have
to have my userspace hat on. With Al being in favor of it this seemed
like we should do it.
I actually wanted to have Jann's and my clone6() version on the table by
now since that would unblock larger things like the time namespace
patchset.

In any case I'll send v3 with my max()/min() braino fixed that Oleg
thankfully spotted and the split into two patches that Arnd suggested.

[1]: https://lkml.org/lkml/2019/5/21/516


Re: [PATCH v2 0/2] close_range()

2019-05-23 Thread Linus Torvalds
On Thu, May 23, 2019 at 11:22 AM Alexey Dobriyan  wrote:
>
> > This is v2 of this patchset.
>
> We've sent fdmap(2) back in the day:

Well, if the main point of the exercise is performance, then fdmap()
is clearly inferior.

Sadly, with all the HW security mitigation, system calls are no longer cheap.

Would there ever be any other reason to traverse unknown open files
than to close them?

   Linus


Re: [PATCH v2 0/2] close_range()

2019-05-23 Thread Alexey Dobriyan
> This is v2 of this patchset.

We've sent fdmap(2) back in the day:
https://marc.info/?l=linux-kernel=150628359803324=4

It can do everything close_range() does and potentially more.

If people ask for it I can rebase it and resend.

P.S.: you are 2 steps behind :-)
https://lwn.net/Articles/490224/