Re: [GIT PULL] remove in-kernel calls to syscalls

2018-04-03 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Mon, Apr 2, 2018 at 12:04 PM, Dominik Brodowski
>  wrote:
> >
> > This patchset removes all in-kernel calls to syscall functions in the
> > kernel with the exception of arch/.
> 
> Ok, this finished off my arch updates for today, I'll probably move on
> to driver pulls tomorrow.
> 
> Anyway, it's in my tree, will push out once my test build finishes.

Thanks!

Dominik, if you submit the x86 ptregs conversion patches in the next 1-2 days 
on 
top of Linus's tree (642e7fd23353), then I can apply them and if they are 
problem-free I can perhaps tempt Linus with a pull request early next week or 
so.

The Spectre angle does make me want those changes as well.

Thanks,

Ingo


Re: [GIT PULL] remove in-kernel calls to syscalls

2018-04-02 Thread Linus Torvalds
On Mon, Apr 2, 2018 at 12:04 PM, Dominik Brodowski
 wrote:
>
> This patchset removes all in-kernel calls to syscall functions in the
> kernel with the exception of arch/.

Ok, this finished off my arch updates for today, I'll probably move on
to driver pulls tomorrow.

Anyway, it's in my tree, will push out once my test build finishes.

Linus


[GIT PULL] remove in-kernel calls to syscalls

2018-04-02 Thread Dominik Brodowski
Linus,

please pull the following changes since commit 
0c8efd610b58cb23cefdfa12015799079aef94ae:

  Linux 4.16-rc5 (2018-03-11 17:25:09 -0700)

which are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-next

up to commit c9a211951c7c79cfb5de888d7d9550872868b086:

  bpf: whitelist all syscalls for error injection (2018-04-02 20:16:21 +0200)

to remove all in-kernel calls to syscalls except from arch/ .

Since the last time I sent the patches out for review,[*] I have solely
added a few more ACKs. Jon Corbet raised the question whether the
documentation really should go to Documentation/process/adding-syscalls.rst
and not to Documentation/process/coding-style.rst (even though, as he said,
that isn't quite right either). As most of the existing instances where
syscalls were called in the kernel were (1) common codepaths for old
and new syscalls, (2) common codepaths for native and compat syscalls, and
(3) syscall multiplexers like sys_ipc(), I have kept it at the former
location for the time being, but will be happy to submit a follow-up patch
to move the documentation bits to a different file.

[*] lkml.kernel.org/r/20180329112426.23043-1-li...@dominikbrodowski.net

All these patches have been in -next, but got rebased a few minutes ago to
include another ACK in patch 2/109 (no code changes). There were/are a few
trivial conflicts against the net, sparc and vfs trees, but not (yet) against
what is in your tree up to commit 86bbbebac1933e6e95e8234c4f7d220c5ddd38bc.

Thanks,
Dominik


System calls are interaction points between userspace and the kernel.
Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy()
should only be called from userspace via the syscall table, but not from
elsewhere in the kernel.

At least on 64-bit x86, it will likely be a hard requirement from v4.17
onwards to not call system call functions in the kernel: It is better to
use use a different calling convention for system calls there, where
struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands
processing over to the actual syscall function. This means that only those
parameters which are actually needed for a specific syscall are passed on
during syscall entry, instead of filling in six CPU registers with random
user space content all the time (which may cause serious trouble down the
call chain). Those x86-specific patches will be pushed through the x86
tree in the near future.

Moreover, rules on how data may be accessed may differ between kernel data
and user data. This is another reason why calling sys_xyzzy() is
generally a bad idea, and -- at most -- acceptable in arch-specific code.

This patchset removes all in-kernel calls to syscall functions in the
kernel with the exception of arch/. On top of this, it cleans up the
three places where many syscalls are referenced or prototyped, namely
kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h.

First goes a patch which defines the goal and explains the rationale:

  syscalls: define and explain goal to not call syscalls in the kernel

A few codepaths can trivially be converted to existing in-kernel interfaces:

  kernel: use kernel_wait4() instead of sys_wait4()
  kernel: open-code sys_rt_sigpending() in sys_sigpending()
  kexec: call do_kexec_load() in compat syscall directly
  mm: use do_futex() instead of sys_futex() in mm_release()
  x86: use _do_fork() in compat_sys_x86_clone()
  x86: remove compat_sys_x86_waitpid()

Then follow many patches which only affect specfic subsystems each, and
replace sys_*() with internal helpers named __sys_*() or do_sys_*(). Let's
start with net/:

  net: socket: add __sys_recvfrom() helper; remove in-kernel call to syscall
  net: socket: add __sys_sendto() helper; remove in-kernel call to syscall
  net: socket: add __sys_accept4() helper; remove in-kernel call to syscall
  net: socket: add __sys_socket() helper; remove in-kernel call to syscall
  net: socket: add __sys_bind() helper; remove in-kernel call to syscall
  net: socket: add __sys_connect() helper; remove in-kernel call to syscall
  net: socket: add __sys_listen() helper; remove in-kernel call to syscall
  net: socket: add __sys_getsockname() helper; remove in-kernel call to syscall
  net: socket: add __sys_getpeername() helper; remove in-kernel call to syscall
  net: socket: add __sys_socketpair() helper; remove in-kernel call to syscall
  net: socket: add __sys_shutdown() helper; remove in-kernel call to syscall
  net: socket: add __sys_setsockopt() helper; remove in-kernel call to syscall
  net: socket: add __sys_getsockopt() helper; remove in-kernel call to syscall
  net: socket: add do_sys_recvmmsg() helper; remove in-kernel call to syscall
  net: socket: move check for forbid_cmsg_compat to __sys_...msg()
  net: socket: replace calls to sys_send() with __sys_sendto()
  net: