Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-17 Thread Dominik Brodowski
On Thu, Mar 15, 2018 at 10:02:04PM +0100, Arnd Bergmann wrote:
> On Thu, Mar 15, 2018 at 8:04 PM, Dominik Brodowski
>  wrote:
> > Here is a re-spin of the first set of patches which reduce the number of
> > syscall invocations from within the kernel; the RFC may be found at
> >
> > The rationale for this change is described in patch 1 as follows:
> >
> > The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
> > and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
> > through kernel entry points, but not from the kernel itself. This
> > will allow cleanups and optimizations to the entry paths *and* to
> > the parts of the kernel code which currently need to pretend to be
> > userspace in order to make use of syscalls.
> >
> > The whole series can be found at
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git 
> > syscalls-next
> >
> > and will be submitted for merging for the v4.17-rc1 cycle, probably together
> > with another batch of related patches I hope to send out tomorrow as a RFC.
> 
> Nice work!
> 
> I've already commented on a few patches that now have a kernel-internal
> helper function that takes a __user pointer. I think those are all only used
> in the early boot code (initramfs etc) that runs before we set_fs() to the
> user address space, but it also causes warnings with sparse. If we
> can change all of them to take kernel pointers, that would let us avoid
> the sparse warnings and start running with a normal user address space
> view. Unfortunately, some of the syscall seem to be harder to change to
> that than others, so not sure if it's worth the effort.

Thanks for your review -- on this issue, please see my other messages.

> Another open question are the declarations in include/linux/syscalls.h.
> These serve as a help for type-checking today, making sure that
> each syscall we refer to from either the syscall table or called
> by some kernel function uses the same prototype that matches
> the syscall definition, which raises the question of whether we want
> to keep the header around at all.

Well, we do not want to call syscalls from other kernel functions, so that
issue will go away by means of these patchsets anyway. With regard to
type-checking the syscall table, we might want to keep the definitions
around and/or generate prototypes from SYSCALL_DEFINEx() directly.

Thanks,
Dominik


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-17 Thread Dominik Brodowski
On Thu, Mar 15, 2018 at 10:02:04PM +0100, Arnd Bergmann wrote:
> On Thu, Mar 15, 2018 at 8:04 PM, Dominik Brodowski
>  wrote:
> > Here is a re-spin of the first set of patches which reduce the number of
> > syscall invocations from within the kernel; the RFC may be found at
> >
> > The rationale for this change is described in patch 1 as follows:
> >
> > The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
> > and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
> > through kernel entry points, but not from the kernel itself. This
> > will allow cleanups and optimizations to the entry paths *and* to
> > the parts of the kernel code which currently need to pretend to be
> > userspace in order to make use of syscalls.
> >
> > The whole series can be found at
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git 
> > syscalls-next
> >
> > and will be submitted for merging for the v4.17-rc1 cycle, probably together
> > with another batch of related patches I hope to send out tomorrow as a RFC.
> 
> Nice work!
> 
> I've already commented on a few patches that now have a kernel-internal
> helper function that takes a __user pointer. I think those are all only used
> in the early boot code (initramfs etc) that runs before we set_fs() to the
> user address space, but it also causes warnings with sparse. If we
> can change all of them to take kernel pointers, that would let us avoid
> the sparse warnings and start running with a normal user address space
> view. Unfortunately, some of the syscall seem to be harder to change to
> that than others, so not sure if it's worth the effort.

Thanks for your review -- on this issue, please see my other messages.

> Another open question are the declarations in include/linux/syscalls.h.
> These serve as a help for type-checking today, making sure that
> each syscall we refer to from either the syscall table or called
> by some kernel function uses the same prototype that matches
> the syscall definition, which raises the question of whether we want
> to keep the header around at all.

Well, we do not want to call syscalls from other kernel functions, so that
issue will go away by means of these patchsets anyway. With regard to
type-checking the syscall table, we might want to keep the definitions
around and/or generate prototypes from SYSCALL_DEFINEx() directly.

Thanks,
Dominik


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-16 Thread Linus Torvalds
On Fri, Mar 16, 2018 at 7:20 AM, Al Viro  wrote:
> On Fri, Mar 16, 2018 at 01:54:23AM -0700, Christoph Hellwig wrote:
>>
>> A lot of the issues here is that the initramfs / do_mount code
>> is written as if it was user space code, but in kernel space.  E.g.
>> using file desriptors etc.

Yeah, some of it could probably pass a 'struct filp *' around instead.

So there are definitely things we could do once we no longer use the
raw system calls anyway.

> ... and I still wonder if it would make more sense to kick that crap
> out into userland.

Oh, no, let's not do that. Even if we were to still maintain control
of user space, it would mean yet another nasty special case for the
compiler and linker scripts and for our initrd generation.

And if we were to spin it out entirely (aka udevd and friends), it
would become one of those nasty situations where there's some *very*
odd code that we need to keep compatibility with because you might run
a new kernel and some old "pre-init user code" stuff.

I'd much rather just make it look more like kernel code.

And maybe remove some code entirely. Christ, we still have the logic
in there to change *floppies* if the ramdisk doesn't fit on a single
floppy disk.  Does it work? Probably not, since presumably it hasn't
been used in ages. But it's still there.

So some of the ioctl's etc are due to insanely old legacy cases.

 Linus


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-16 Thread Linus Torvalds
On Fri, Mar 16, 2018 at 7:20 AM, Al Viro  wrote:
> On Fri, Mar 16, 2018 at 01:54:23AM -0700, Christoph Hellwig wrote:
>>
>> A lot of the issues here is that the initramfs / do_mount code
>> is written as if it was user space code, but in kernel space.  E.g.
>> using file desriptors etc.

Yeah, some of it could probably pass a 'struct filp *' around instead.

So there are definitely things we could do once we no longer use the
raw system calls anyway.

> ... and I still wonder if it would make more sense to kick that crap
> out into userland.

Oh, no, let's not do that. Even if we were to still maintain control
of user space, it would mean yet another nasty special case for the
compiler and linker scripts and for our initrd generation.

And if we were to spin it out entirely (aka udevd and friends), it
would become one of those nasty situations where there's some *very*
odd code that we need to keep compatibility with because you might run
a new kernel and some old "pre-init user code" stuff.

I'd much rather just make it look more like kernel code.

And maybe remove some code entirely. Christ, we still have the logic
in there to change *floppies* if the ramdisk doesn't fit on a single
floppy disk.  Does it work? Probably not, since presumably it hasn't
been used in ages. But it's still there.

So some of the ioctl's etc are due to insanely old legacy cases.

 Linus


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-16 Thread Al Viro
On Fri, Mar 16, 2018 at 01:54:23AM -0700, Christoph Hellwig wrote:
> On Thu, Mar 15, 2018 at 05:54:27PM -0700, Linus Torvalds wrote:
> > Yes. And honestly, I'd rather have these kinds of "just change the
> > calling convention" almost automated patches separately - and then the
> > cleanups later.
> > 
> > Mixing the calling convention change and the cleanup together is just
> > confusing and potentially causes subtle issues.
> 
> A lot of the issues here is that the initramfs / do_mount code
> is written as if it was user space code, but in kernel space.  E.g.
> using file desriptors etc.

... and I still wonder if it would make more sense to kick that crap
out into userland.


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-16 Thread Al Viro
On Fri, Mar 16, 2018 at 01:54:23AM -0700, Christoph Hellwig wrote:
> On Thu, Mar 15, 2018 at 05:54:27PM -0700, Linus Torvalds wrote:
> > Yes. And honestly, I'd rather have these kinds of "just change the
> > calling convention" almost automated patches separately - and then the
> > cleanups later.
> > 
> > Mixing the calling convention change and the cleanup together is just
> > confusing and potentially causes subtle issues.
> 
> A lot of the issues here is that the initramfs / do_mount code
> is written as if it was user space code, but in kernel space.  E.g.
> using file desriptors etc.

... and I still wonder if it would make more sense to kick that crap
out into userland.


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-16 Thread Dominik Brodowski
On Fri, Mar 16, 2018 at 09:01:11AM +, Zhang, Ning A wrote:
> 在 2018-03-15四的 20:04 +0100,Dominik Brodowski写道:
> > Here is a re-spin of the first set of patches which reduce the number of
> > syscall invocations from within the kernel; the RFC may be found at
> > 
> > The rationale for this change is described in patch 1 as follows:
> > 
> > The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
> > and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
> > through kernel entry points, but not from the kernel itself. This
> > will allow cleanups and optimizations to the entry paths *and* to
> > the parts of the kernel code which currently need to pretend to be
> > userspace in order to make use of syscalls.
> 
> I think this is really bad to change syscalls one by one, to do_*
> 
> why not change SYSCALL_DEFINEx to define kernel wrappers?

Basically, for two reasons: First, only a subset of all syscalls require
such wrappers -- only about a third of all syscalls are called from within
the kernel at the moment (rough guess). Second, and more important: We want
to reduce the amount of such usage; see, e.g., the messages by Christoph and
Arnd in this thread.

Thanks,
Dominik


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-16 Thread Dominik Brodowski
On Fri, Mar 16, 2018 at 09:01:11AM +, Zhang, Ning A wrote:
> 在 2018-03-15四的 20:04 +0100,Dominik Brodowski写道:
> > Here is a re-spin of the first set of patches which reduce the number of
> > syscall invocations from within the kernel; the RFC may be found at
> > 
> > The rationale for this change is described in patch 1 as follows:
> > 
> > The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
> > and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
> > through kernel entry points, but not from the kernel itself. This
> > will allow cleanups and optimizations to the entry paths *and* to
> > the parts of the kernel code which currently need to pretend to be
> > userspace in order to make use of syscalls.
> 
> I think this is really bad to change syscalls one by one, to do_*
> 
> why not change SYSCALL_DEFINEx to define kernel wrappers?

Basically, for two reasons: First, only a subset of all syscalls require
such wrappers -- only about a third of all syscalls are called from within
the kernel at the moment (rough guess). Second, and more important: We want
to reduce the amount of such usage; see, e.g., the messages by Christoph and
Arnd in this thread.

Thanks,
Dominik


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-16 Thread Zhang, Ning A
在 2018-03-15四的 20:04 +0100,Dominik Brodowski写道:
> Here is a re-spin of the first set of patches which reduce the number of
> syscall invocations from within the kernel; the RFC may be found at
> 
> The rationale for this change is described in patch 1 as follows:
> 
>   The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
>   and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
>   through kernel entry points, but not from the kernel itself. This
>   will allow cleanups and optimizations to the entry paths *and* to
>   the parts of the kernel code which currently need to pretend to be
>   userspace in order to make use of syscalls.

I think this is really bad to change syscalls one by one, to do_*

why not change SYSCALL_DEFINEx to define kernel wrappers?


> 
The whole series can be found at 
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git 
> syscalls-next
> 
> and will be submitted for merging for the v4.17-rc1 cycle, probably together
> with another batch of related patches I hope to send out tomorrow as a RFC.
> 
> Changes since the RFC / v1:
> 
> - rebase to v4.15-rc5; sys_ioperm already got its SYSCALL_DEFINE3
> - add ACKs
> - CC: -> Cc: (suggested by Ingo Molnar)
> - update comment in include/linux/syscalls.h (suggested by Ingo Molnar and
>   Andy Lutomirski)
> - separate declarations from definitions with newlines in
>   include/linux/syscalls.h; add comment on ksys_close() (suggested by
>   Ingo Molnar)
> - expand commit messages (suggested by Christoph Hellwig)
> - include patch 36:
>   fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()
> - do not worry about the following archs, as they are going away:
>   cris, frv, metag, mn10300, score, tile
>   (solving conflicts in -next)
> - fix builds with CONFIG_FUTEX=n, CONFIG_ADVISE_SYSCALLS=n (solving issues
>   found by Stephen Rothwell)
> 
> Thanks,
>   Dominik
> 
> 
> Dominik Brodowski (36):
>   syscalls: define goal to not call sys_xyzzy() from within the kernel
>   kernel: use kernel_wait4() instead of sys_wait4()
>   mm: use do_futex() instead of sys_futex() in mm_release()
>   kernel: add do_getpgid() helper; remove internal call to sys_getpgid()
>   fs: add do_readlinkat() helper; remove internal call to
> sys_readlinkat()
>   fs: add do_pipe2() helper; remove internal call to sys_pipe2()
>   fs: add do_renameat2() helper; remove internal call to sys_renameat2()
>   fs: add do_futimesat() helper; remove internal call to sys_futimesat()
>   syscalls: add do_epoll_*() helpers; remove internal calls to
> sys_epoll_*()
>   fs: add do_signalfd4() helper; remove internal calls to
> sys_signalfd4()
>   fs: add do_eventfd() helper; remove internal call to sys_eventfd()
>   kernel: open-code sys_rt_sigpending() in sys_sigpending()
>   x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to
> sys_ioperm()
>   fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
>   fs: add ksys_umount() helper; remove in-kernel call to sys_umount()
>   fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}()
>   fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()
>   fs: add ksys_write() helper; remove in-kernel calls to sys_write()
>   kernel: add ksys_unshare() helper; remove in-kernel calls to
> sys_unshare()
>   mm: add ksys_fadvise64_64() helper; remove in-kernel call to
> sys_fadvise64_64()
>   mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to
> sys_mmap_pgoff()
>   fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir()
>   fs: add ksys_sync_file_range helper(); remove in-kernel calls to
> syscall
>   fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink()
>   hostfs: rename do_rmdir() to hostfs_do_rmdir()
>   fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir()
>   fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel
> calls to syscall
>   fs: add do_symlinkat() helper and ksys_symlink() wrapper; remove
> in-kernel calls to syscall
>   fs: add do_mknodat() helper and ksys_mknod() wrapper; remove in-kernel
> calls to syscall
>   fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel
> calls to syscall
>   fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod()
> wrapper; remove in-kernel calls to syscall
>   fs: add do_faccessat() helper and ksys_access() wrapper; remove
> in-kernel calls to syscall
>   fs: add ksys_ftruncate() wrapper; remove in-kernel calls to
> sys_ftruncate()
>   fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown()
> wrappers
>   fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()
>   fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()
> 
>  Documentation/process/adding-syscalls.rst |  14 ---
>  arch/alpha/kernel/osf_sys.c   |   2 +-
>  

Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-16 Thread Zhang, Ning A
在 2018-03-15四的 20:04 +0100,Dominik Brodowski写道:
> Here is a re-spin of the first set of patches which reduce the number of
> syscall invocations from within the kernel; the RFC may be found at
> 
> The rationale for this change is described in patch 1 as follows:
> 
>   The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
>   and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
>   through kernel entry points, but not from the kernel itself. This
>   will allow cleanups and optimizations to the entry paths *and* to
>   the parts of the kernel code which currently need to pretend to be
>   userspace in order to make use of syscalls.

I think this is really bad to change syscalls one by one, to do_*

why not change SYSCALL_DEFINEx to define kernel wrappers?


> 
The whole series can be found at 
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git 
> syscalls-next
> 
> and will be submitted for merging for the v4.17-rc1 cycle, probably together
> with another batch of related patches I hope to send out tomorrow as a RFC.
> 
> Changes since the RFC / v1:
> 
> - rebase to v4.15-rc5; sys_ioperm already got its SYSCALL_DEFINE3
> - add ACKs
> - CC: -> Cc: (suggested by Ingo Molnar)
> - update comment in include/linux/syscalls.h (suggested by Ingo Molnar and
>   Andy Lutomirski)
> - separate declarations from definitions with newlines in
>   include/linux/syscalls.h; add comment on ksys_close() (suggested by
>   Ingo Molnar)
> - expand commit messages (suggested by Christoph Hellwig)
> - include patch 36:
>   fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()
> - do not worry about the following archs, as they are going away:
>   cris, frv, metag, mn10300, score, tile
>   (solving conflicts in -next)
> - fix builds with CONFIG_FUTEX=n, CONFIG_ADVISE_SYSCALLS=n (solving issues
>   found by Stephen Rothwell)
> 
> Thanks,
>   Dominik
> 
> 
> Dominik Brodowski (36):
>   syscalls: define goal to not call sys_xyzzy() from within the kernel
>   kernel: use kernel_wait4() instead of sys_wait4()
>   mm: use do_futex() instead of sys_futex() in mm_release()
>   kernel: add do_getpgid() helper; remove internal call to sys_getpgid()
>   fs: add do_readlinkat() helper; remove internal call to
> sys_readlinkat()
>   fs: add do_pipe2() helper; remove internal call to sys_pipe2()
>   fs: add do_renameat2() helper; remove internal call to sys_renameat2()
>   fs: add do_futimesat() helper; remove internal call to sys_futimesat()
>   syscalls: add do_epoll_*() helpers; remove internal calls to
> sys_epoll_*()
>   fs: add do_signalfd4() helper; remove internal calls to
> sys_signalfd4()
>   fs: add do_eventfd() helper; remove internal call to sys_eventfd()
>   kernel: open-code sys_rt_sigpending() in sys_sigpending()
>   x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to
> sys_ioperm()
>   fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
>   fs: add ksys_umount() helper; remove in-kernel call to sys_umount()
>   fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}()
>   fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()
>   fs: add ksys_write() helper; remove in-kernel calls to sys_write()
>   kernel: add ksys_unshare() helper; remove in-kernel calls to
> sys_unshare()
>   mm: add ksys_fadvise64_64() helper; remove in-kernel call to
> sys_fadvise64_64()
>   mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to
> sys_mmap_pgoff()
>   fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir()
>   fs: add ksys_sync_file_range helper(); remove in-kernel calls to
> syscall
>   fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink()
>   hostfs: rename do_rmdir() to hostfs_do_rmdir()
>   fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir()
>   fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel
> calls to syscall
>   fs: add do_symlinkat() helper and ksys_symlink() wrapper; remove
> in-kernel calls to syscall
>   fs: add do_mknodat() helper and ksys_mknod() wrapper; remove in-kernel
> calls to syscall
>   fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel
> calls to syscall
>   fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod()
> wrapper; remove in-kernel calls to syscall
>   fs: add do_faccessat() helper and ksys_access() wrapper; remove
> in-kernel calls to syscall
>   fs: add ksys_ftruncate() wrapper; remove in-kernel calls to
> sys_ftruncate()
>   fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown()
> wrappers
>   fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()
>   fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()
> 
>  Documentation/process/adding-syscalls.rst |  14 ---
>  arch/alpha/kernel/osf_sys.c   |   2 +-
>  

Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-16 Thread Christoph Hellwig
On Thu, Mar 15, 2018 at 05:54:27PM -0700, Linus Torvalds wrote:
> Yes. And honestly, I'd rather have these kinds of "just change the
> calling convention" almost automated patches separately - and then the
> cleanups later.
> 
> Mixing the calling convention change and the cleanup together is just
> confusing and potentially causes subtle issues.

A lot of the issues here is that the initramfs / do_mount code
is written as if it was user space code, but in kernel space.  E.g.
using file desriptors etc.  I think doing one or a few patches
before this series to sort this out would really reduce the scope
of work and be the right thing.  For any additional minor cleanups
I agree that it might make sense to postpone them.


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-16 Thread Christoph Hellwig
On Thu, Mar 15, 2018 at 05:54:27PM -0700, Linus Torvalds wrote:
> Yes. And honestly, I'd rather have these kinds of "just change the
> calling convention" almost automated patches separately - and then the
> cleanups later.
> 
> Mixing the calling convention change and the cleanup together is just
> confusing and potentially causes subtle issues.

A lot of the issues here is that the initramfs / do_mount code
is written as if it was user space code, but in kernel space.  E.g.
using file desriptors etc.  I think doing one or a few patches
before this series to sort this out would really reduce the scope
of work and be the right thing.  For any additional minor cleanups
I agree that it might make sense to postpone them.


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-15 Thread Linus Torvalds
On Thu, Mar 15, 2018 at 5:38 PM, Andy Lutomirski  wrote:
>
> I don't think this patch series should wait for any of these cleanups,
> though.  We need these patches to change the x86_64 internal syscall
> function signature, which we've been wanting to do for a little while.

Yes. And honestly, I'd rather have these kinds of "just change the
calling convention" almost automated patches separately - and then the
cleanups later.

Mixing the calling convention change and the cleanup together is just
confusing and potentially causes subtle issues.

 Linus


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-15 Thread Linus Torvalds
On Thu, Mar 15, 2018 at 5:38 PM, Andy Lutomirski  wrote:
>
> I don't think this patch series should wait for any of these cleanups,
> though.  We need these patches to change the x86_64 internal syscall
> function signature, which we've been wanting to do for a little while.

Yes. And honestly, I'd rather have these kinds of "just change the
calling convention" almost automated patches separately - and then the
cleanups later.

Mixing the calling convention change and the cleanup together is just
confusing and potentially causes subtle issues.

 Linus


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-15 Thread Andy Lutomirski
On Thu, Mar 15, 2018 at 9:02 PM, Arnd Bergmann  wrote:
> On Thu, Mar 15, 2018 at 8:04 PM, Dominik Brodowski
>  wrote:
>> Here is a re-spin of the first set of patches which reduce the number of
>> syscall invocations from within the kernel; the RFC may be found at
>>
>> The rationale for this change is described in patch 1 as follows:
>>
>> The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
>> and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
>> through kernel entry points, but not from the kernel itself. This
>> will allow cleanups and optimizations to the entry paths *and* to
>> the parts of the kernel code which currently need to pretend to be
>> userspace in order to make use of syscalls.
>>
>> The whole series can be found at
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git 
>> syscalls-next
>>
>> and will be submitted for merging for the v4.17-rc1 cycle, probably together
>> with another batch of related patches I hope to send out tomorrow as a RFC.
>
> Nice work!
>
> I've already commented on a few patches that now have a kernel-internal
> helper function that takes a __user pointer. I think those are all only used
> in the early boot code (initramfs etc) that runs before we set_fs() to the
> user address space, but it also causes warnings with sparse. If we
> can change all of them to take kernel pointers, that would let us avoid
> the sparse warnings and start running with a normal user address space
> view. Unfortunately, some of the syscall seem to be harder to change to
> that than others, so not sure if it's worth the effort.

It would be fantastic to get rid of set_fs() entirely and make it
impossible for get_user(), etc to ever access kernel memory.  And this
effort is necessary to ever achieve that.

I don't think this patch series should wait for any of these cleanups,
though.  We need these patches to change the x86_64 internal syscall
function signature, which we've been wanting to do for a little while.


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-15 Thread Andy Lutomirski
On Thu, Mar 15, 2018 at 9:02 PM, Arnd Bergmann  wrote:
> On Thu, Mar 15, 2018 at 8:04 PM, Dominik Brodowski
>  wrote:
>> Here is a re-spin of the first set of patches which reduce the number of
>> syscall invocations from within the kernel; the RFC may be found at
>>
>> The rationale for this change is described in patch 1 as follows:
>>
>> The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
>> and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
>> through kernel entry points, but not from the kernel itself. This
>> will allow cleanups and optimizations to the entry paths *and* to
>> the parts of the kernel code which currently need to pretend to be
>> userspace in order to make use of syscalls.
>>
>> The whole series can be found at
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git 
>> syscalls-next
>>
>> and will be submitted for merging for the v4.17-rc1 cycle, probably together
>> with another batch of related patches I hope to send out tomorrow as a RFC.
>
> Nice work!
>
> I've already commented on a few patches that now have a kernel-internal
> helper function that takes a __user pointer. I think those are all only used
> in the early boot code (initramfs etc) that runs before we set_fs() to the
> user address space, but it also causes warnings with sparse. If we
> can change all of them to take kernel pointers, that would let us avoid
> the sparse warnings and start running with a normal user address space
> view. Unfortunately, some of the syscall seem to be harder to change to
> that than others, so not sure if it's worth the effort.

It would be fantastic to get rid of set_fs() entirely and make it
impossible for get_user(), etc to ever access kernel memory.  And this
effort is necessary to ever achieve that.

I don't think this patch series should wait for any of these cleanups,
though.  We need these patches to change the x86_64 internal syscall
function signature, which we've been wanting to do for a little while.


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-15 Thread Arnd Bergmann
On Thu, Mar 15, 2018 at 8:04 PM, Dominik Brodowski
 wrote:
> Here is a re-spin of the first set of patches which reduce the number of
> syscall invocations from within the kernel; the RFC may be found at
>
> The rationale for this change is described in patch 1 as follows:
>
> The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
> and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
> through kernel entry points, but not from the kernel itself. This
> will allow cleanups and optimizations to the entry paths *and* to
> the parts of the kernel code which currently need to pretend to be
> userspace in order to make use of syscalls.
>
> The whole series can be found at
>
> https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git 
> syscalls-next
>
> and will be submitted for merging for the v4.17-rc1 cycle, probably together
> with another batch of related patches I hope to send out tomorrow as a RFC.

Nice work!

I've already commented on a few patches that now have a kernel-internal
helper function that takes a __user pointer. I think those are all only used
in the early boot code (initramfs etc) that runs before we set_fs() to the
user address space, but it also causes warnings with sparse. If we
can change all of them to take kernel pointers, that would let us avoid
the sparse warnings and start running with a normal user address space
view. Unfortunately, some of the syscall seem to be harder to change to
that than others, so not sure if it's worth the effort.

Another open question are the declarations in include/linux/syscalls.h.
These serve as a help for type-checking today, making sure that
each syscall we refer to from either the syscall table or called
by some kernel function uses the same prototype that matches
the syscall definition, which raises the question of whether we want
to keep the header around at all.

Arnd


Re: [PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-15 Thread Arnd Bergmann
On Thu, Mar 15, 2018 at 8:04 PM, Dominik Brodowski
 wrote:
> Here is a re-spin of the first set of patches which reduce the number of
> syscall invocations from within the kernel; the RFC may be found at
>
> The rationale for this change is described in patch 1 as follows:
>
> The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
> and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
> through kernel entry points, but not from the kernel itself. This
> will allow cleanups and optimizations to the entry paths *and* to
> the parts of the kernel code which currently need to pretend to be
> userspace in order to make use of syscalls.
>
> The whole series can be found at
>
> https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git 
> syscalls-next
>
> and will be submitted for merging for the v4.17-rc1 cycle, probably together
> with another batch of related patches I hope to send out tomorrow as a RFC.

Nice work!

I've already commented on a few patches that now have a kernel-internal
helper function that takes a __user pointer. I think those are all only used
in the early boot code (initramfs etc) that runs before we set_fs() to the
user address space, but it also causes warnings with sparse. If we
can change all of them to take kernel pointers, that would let us avoid
the sparse warnings and start running with a normal user address space
view. Unfortunately, some of the syscall seem to be harder to change to
that than others, so not sure if it's worth the effort.

Another open question are the declarations in include/linux/syscalls.h.
These serve as a help for type-checking today, making sure that
each syscall we refer to from either the syscall table or called
by some kernel function uses the same prototype that matches
the syscall definition, which raises the question of whether we want
to keep the header around at all.

Arnd


[PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-15 Thread Dominik Brodowski
Here is a re-spin of the first set of patches which reduce the number of
syscall invocations from within the kernel; the RFC may be found at

The rationale for this change is described in patch 1 as follows:

The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
through kernel entry points, but not from the kernel itself. This
will allow cleanups and optimizations to the entry paths *and* to
the parts of the kernel code which currently need to pretend to be
userspace in order to make use of syscalls.

The whole series can be found at 

https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git 
syscalls-next

and will be submitted for merging for the v4.17-rc1 cycle, probably together
with another batch of related patches I hope to send out tomorrow as a RFC.

Changes since the RFC / v1:

- rebase to v4.15-rc5; sys_ioperm already got its SYSCALL_DEFINE3
- add ACKs
- CC: -> Cc: (suggested by Ingo Molnar)
- update comment in include/linux/syscalls.h (suggested by Ingo Molnar and
Andy Lutomirski)
- separate declarations from definitions with newlines in
include/linux/syscalls.h; add comment on ksys_close() (suggested by
Ingo Molnar)
- expand commit messages (suggested by Christoph Hellwig)
- include patch 36:
fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()
- do not worry about the following archs, as they are going away:
cris, frv, metag, mn10300, score, tile
(solving conflicts in -next)
- fix builds with CONFIG_FUTEX=n, CONFIG_ADVISE_SYSCALLS=n (solving issues
found by Stephen Rothwell)

Thanks,
Dominik


Dominik Brodowski (36):
  syscalls: define goal to not call sys_xyzzy() from within the kernel
  kernel: use kernel_wait4() instead of sys_wait4()
  mm: use do_futex() instead of sys_futex() in mm_release()
  kernel: add do_getpgid() helper; remove internal call to sys_getpgid()
  fs: add do_readlinkat() helper; remove internal call to
sys_readlinkat()
  fs: add do_pipe2() helper; remove internal call to sys_pipe2()
  fs: add do_renameat2() helper; remove internal call to sys_renameat2()
  fs: add do_futimesat() helper; remove internal call to sys_futimesat()
  syscalls: add do_epoll_*() helpers; remove internal calls to
sys_epoll_*()
  fs: add do_signalfd4() helper; remove internal calls to
sys_signalfd4()
  fs: add do_eventfd() helper; remove internal call to sys_eventfd()
  kernel: open-code sys_rt_sigpending() in sys_sigpending()
  x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to
sys_ioperm()
  fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
  fs: add ksys_umount() helper; remove in-kernel call to sys_umount()
  fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}()
  fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()
  fs: add ksys_write() helper; remove in-kernel calls to sys_write()
  kernel: add ksys_unshare() helper; remove in-kernel calls to
sys_unshare()
  mm: add ksys_fadvise64_64() helper; remove in-kernel call to
sys_fadvise64_64()
  mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to
sys_mmap_pgoff()
  fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir()
  fs: add ksys_sync_file_range helper(); remove in-kernel calls to
syscall
  fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink()
  hostfs: rename do_rmdir() to hostfs_do_rmdir()
  fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir()
  fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel
calls to syscall
  fs: add do_symlinkat() helper and ksys_symlink() wrapper; remove
in-kernel calls to syscall
  fs: add do_mknodat() helper and ksys_mknod() wrapper; remove in-kernel
calls to syscall
  fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel
calls to syscall
  fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod()
wrapper; remove in-kernel calls to syscall
  fs: add do_faccessat() helper and ksys_access() wrapper; remove
in-kernel calls to syscall
  fs: add ksys_ftruncate() wrapper; remove in-kernel calls to
sys_ftruncate()
  fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown()
wrappers
  fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()
  fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()

 Documentation/process/adding-syscalls.rst |  14 ---
 arch/alpha/kernel/osf_sys.c   |   2 +-
 arch/arm/kernel/sys_arm.c |   2 +-
 arch/arm64/kernel/sys.c   |   2 +-
 arch/ia64/kernel/sys_ia64.c   |   4 +-
 arch/m68k/kernel/sys_m68k.c   |   2 +-
 arch/microblaze/kernel/sys_microblaze.c   |   6 +-
 arch/mips/kernel/linux32.c|  10 +-
 arch/mips/kernel/syscall.c|   6 +-
 

[PATCH v2 00/36] remove in-kernel syscall invocations (part 1)

2018-03-15 Thread Dominik Brodowski
Here is a re-spin of the first set of patches which reduce the number of
syscall invocations from within the kernel; the RFC may be found at

The rationale for this change is described in patch 1 as follows:

The syscall entry points to the kernel defined by SYSCALL_DEFINEx()
and COMPAT_SYSCALL_DEFINEx() should only be called from userspace
through kernel entry points, but not from the kernel itself. This
will allow cleanups and optimizations to the entry paths *and* to
the parts of the kernel code which currently need to pretend to be
userspace in order to make use of syscalls.

The whole series can be found at 

https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git 
syscalls-next

and will be submitted for merging for the v4.17-rc1 cycle, probably together
with another batch of related patches I hope to send out tomorrow as a RFC.

Changes since the RFC / v1:

- rebase to v4.15-rc5; sys_ioperm already got its SYSCALL_DEFINE3
- add ACKs
- CC: -> Cc: (suggested by Ingo Molnar)
- update comment in include/linux/syscalls.h (suggested by Ingo Molnar and
Andy Lutomirski)
- separate declarations from definitions with newlines in
include/linux/syscalls.h; add comment on ksys_close() (suggested by
Ingo Molnar)
- expand commit messages (suggested by Christoph Hellwig)
- include patch 36:
fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()
- do not worry about the following archs, as they are going away:
cris, frv, metag, mn10300, score, tile
(solving conflicts in -next)
- fix builds with CONFIG_FUTEX=n, CONFIG_ADVISE_SYSCALLS=n (solving issues
found by Stephen Rothwell)

Thanks,
Dominik


Dominik Brodowski (36):
  syscalls: define goal to not call sys_xyzzy() from within the kernel
  kernel: use kernel_wait4() instead of sys_wait4()
  mm: use do_futex() instead of sys_futex() in mm_release()
  kernel: add do_getpgid() helper; remove internal call to sys_getpgid()
  fs: add do_readlinkat() helper; remove internal call to
sys_readlinkat()
  fs: add do_pipe2() helper; remove internal call to sys_pipe2()
  fs: add do_renameat2() helper; remove internal call to sys_renameat2()
  fs: add do_futimesat() helper; remove internal call to sys_futimesat()
  syscalls: add do_epoll_*() helpers; remove internal calls to
sys_epoll_*()
  fs: add do_signalfd4() helper; remove internal calls to
sys_signalfd4()
  fs: add do_eventfd() helper; remove internal call to sys_eventfd()
  kernel: open-code sys_rt_sigpending() in sys_sigpending()
  x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to
sys_ioperm()
  fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
  fs: add ksys_umount() helper; remove in-kernel call to sys_umount()
  fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}()
  fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()
  fs: add ksys_write() helper; remove in-kernel calls to sys_write()
  kernel: add ksys_unshare() helper; remove in-kernel calls to
sys_unshare()
  mm: add ksys_fadvise64_64() helper; remove in-kernel call to
sys_fadvise64_64()
  mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to
sys_mmap_pgoff()
  fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir()
  fs: add ksys_sync_file_range helper(); remove in-kernel calls to
syscall
  fs: add ksys_unlink() wrapper; remove in-kernel calls to sys_unlink()
  hostfs: rename do_rmdir() to hostfs_do_rmdir()
  fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir()
  fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel
calls to syscall
  fs: add do_symlinkat() helper and ksys_symlink() wrapper; remove
in-kernel calls to syscall
  fs: add do_mknodat() helper and ksys_mknod() wrapper; remove in-kernel
calls to syscall
  fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel
calls to syscall
  fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod()
wrapper; remove in-kernel calls to syscall
  fs: add do_faccessat() helper and ksys_access() wrapper; remove
in-kernel calls to syscall
  fs: add ksys_ftruncate() wrapper; remove in-kernel calls to
sys_ftruncate()
  fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown()
wrappers
  fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()
  fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()

 Documentation/process/adding-syscalls.rst |  14 ---
 arch/alpha/kernel/osf_sys.c   |   2 +-
 arch/arm/kernel/sys_arm.c |   2 +-
 arch/arm64/kernel/sys.c   |   2 +-
 arch/ia64/kernel/sys_ia64.c   |   4 +-
 arch/m68k/kernel/sys_m68k.c   |   2 +-
 arch/microblaze/kernel/sys_microblaze.c   |   6 +-
 arch/mips/kernel/linux32.c|  10 +-
 arch/mips/kernel/syscall.c|   6 +-