Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
Hi Oleg, On Thu, May 17, 2018 at 05:46:37PM +0200, Oleg Nesterov wrote: > On 05/17, Tycho Andersen wrote: > > > > > From lockdep pov this loop tries to take the same lock twice or more, it > > > shoul > > > complain. > > > > I didn't, but I guess that's because it's not trying to take the same lock > > twice -- the pointer cur is changing in the loop. > > Yes, I see. But this is the same lock for lockdep, it has the same class. I finally figured this out, I needed CONFIG_PROVE_LOCKING=y too, anyway, I've added the nesting annotations for v3. Thanks! Tycho
Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
On Sat, May 19, 2018 at 01:01:15PM +0800, kbuild test robot wrote: > Hi Tycho, > > I love your patch! Yet something to improve: Whoops, seems I forgot to compile the !CONFIG_SECCOMP_USER_NOTIFICATION case. Anyways, I've fixed this for v3. Tycho
Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
Hi Tycho, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.17-rc5] [cannot apply to next-20180517] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Tycho-Andersen/seccomp-trap-to-userspace/20180519-071527 config: i386-randconfig-a1-05181545 (attached as .config) compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4 reproduce: # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): kernel/seccomp.c: In function '__seccomp_filter': kernel/seccomp.c:891:46: warning: passing argument 2 of 'seccomp_do_user_notification' makes integer from pointer without a cast seccomp_do_user_notification(this_syscall, match, sd); ^ kernel/seccomp.c:802:13: note: expected 'u32' but argument is of type 'struct seccomp_filter *' static void seccomp_do_user_notification(int this_syscall, ^ >> kernel/seccomp.c:891:53: warning: passing argument 3 of >> 'seccomp_do_user_notification' from incompatible pointer type seccomp_do_user_notification(this_syscall, match, sd); ^ kernel/seccomp.c:802:13: note: expected 'struct seccomp_filter *' but argument is of type 'const struct seccomp_data *' static void seccomp_do_user_notification(int this_syscall, ^ kernel/seccomp.c:891:3: error: too few arguments to function 'seccomp_do_user_notification' seccomp_do_user_notification(this_syscall, match, sd); ^ kernel/seccomp.c:802:13: note: declared here static void seccomp_do_user_notification(int this_syscall, ^ kernel/seccomp.c: In function 'seccomp_set_mode_filter': >> kernel/seccomp.c:1036:3: error: implicit declaration of function >> 'get_unused_fd_flags' [-Werror=implicit-function-declaration] listener = get_unused_fd_flags(O_RDWR); ^ >> kernel/seccomp.c:1042:3: error: implicit declaration of function >> 'init_listener' [-Werror=implicit-function-declaration] listener_f = init_listener(current, prepared); ^ kernel/seccomp.c:1042:14: warning: assignment makes pointer from integer without a cast listener_f = init_listener(current, prepared); ^ >> kernel/seccomp.c:1044:4: error: implicit declaration of function >> 'put_unused_fd' [-Werror=implicit-function-declaration] put_unused_fd(listener); ^ >> kernel/seccomp.c:1077:4: error: implicit declaration of function 'fput' >> [-Werror=implicit-function-declaration] fput(listener_f); ^ >> kernel/seccomp.c:1080:4: error: implicit declaration of function >> 'fd_install' [-Werror=implicit-function-declaration] fd_install(listener, listener_f); ^ cc1: some warnings being treated as errors vim +/get_unused_fd_flags +1036 kernel/seccomp.c 812 813 #ifdef CONFIG_SECCOMP_FILTER 814 static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, 815 const bool recheck_after_trace) 816 { 817 u32 filter_ret, action; 818 struct seccomp_filter *match = NULL; 819 int data; 820 821 /* 822 * Make sure that any changes to mode from another thread have 823 * been seen after TIF_SECCOMP was seen. 824 */ 825 rmb(); 826 827 filter_ret = seccomp_run_filters(sd, &match); 828 data = filter_ret & SECCOMP_RET_DATA; 829 action = filter_ret & SECCOMP_RET_ACTION_FULL; 830 831 switch (action) { 832 case SECCOMP_RET_ERRNO: 833 /* Set low-order bits as an errno, capped at MAX_ERRNO. */ 834 if (data > MAX_ERRNO) 835 data = MAX_ERRNO; 836 syscall_set_return_value(current, task_pt_regs(current), 837 -data, 0); 838 goto skip; 839 840 case SECCOMP_RET_TRAP: 841 /* Show the handler the original registers. */ 842 syscall_rollback(current, task_pt_regs(current)); 843 /* Let the filter pass back 16 bits of data. */ 844 seccomp_send_sigsys(this_syscall, data); 845 goto skip; 846 847 case SECCOMP_RET_TRACE: 848 /* We've been put in this state by the ptracer already. */ 849 if (recheck_after_trace) 850 return 0; 851 852 /* ENOSYS these calls if there is no tracer attached. */ 853 if (!ptrace_event_enabled(current, PTRACE_EVENT_SECCOMP)) { 854 syscall_set
Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
Hi Tycho, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.17-rc5] [cannot apply to next-20180517] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Tycho-Andersen/seccomp-trap-to-userspace/20180519-071527 config: x86_64-randconfig-x010-201819 (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All error/warnings (new ones prefixed by >>): kernel/seccomp.c: In function '__seccomp_filter': >> kernel/seccomp.c:891:46: warning: passing argument 2 of >> 'seccomp_do_user_notification' makes integer from pointer without a cast >> [-Wint-conversion] seccomp_do_user_notification(this_syscall, match, sd); ^ kernel/seccomp.c:802:13: note: expected 'u32 {aka unsigned int}' but argument is of type 'struct seccomp_filter *' static void seccomp_do_user_notification(int this_syscall, ^~~~ >> kernel/seccomp.c:891:53: error: passing argument 3 of >> 'seccomp_do_user_notification' from incompatible pointer type >> [-Werror=incompatible-pointer-types] seccomp_do_user_notification(this_syscall, match, sd); ^~ kernel/seccomp.c:802:13: note: expected 'struct seccomp_filter *' but argument is of type 'const struct seccomp_data *' static void seccomp_do_user_notification(int this_syscall, ^~~~ >> kernel/seccomp.c:891:3: error: too few arguments to function >> 'seccomp_do_user_notification' seccomp_do_user_notification(this_syscall, match, sd); ^~~~ kernel/seccomp.c:802:13: note: declared here static void seccomp_do_user_notification(int this_syscall, ^~~~ kernel/seccomp.c: In function 'seccomp_set_mode_filter': >> kernel/seccomp.c:1036:14: error: implicit declaration of function >> 'get_unused_fd_flags'; did you mean 'getname_flags'? >> [-Werror=implicit-function-declaration] listener = get_unused_fd_flags(O_RDWR); ^~~ getname_flags >> kernel/seccomp.c:1042:16: error: implicit declaration of function >> 'init_listener'; did you mean 'init_llist_head'? >> [-Werror=implicit-function-declaration] listener_f = init_listener(current, prepared); ^ init_llist_head >> kernel/seccomp.c:1042:14: warning: assignment makes pointer from integer >> without a cast [-Wint-conversion] listener_f = init_listener(current, prepared); ^ >> kernel/seccomp.c:1044:4: error: implicit declaration of function >> 'put_unused_fd'; did you mean 'put_user_ns'? >> [-Werror=implicit-function-declaration] put_unused_fd(listener); ^ put_user_ns >> kernel/seccomp.c:1077:4: error: implicit declaration of function 'fput'; did >> you mean 'iput'? [-Werror=implicit-function-declaration] fput(listener_f); ^~~~ iput >> kernel/seccomp.c:1080:4: error: implicit declaration of function >> 'fd_install'; did you mean 'fs_initcall'? >> [-Werror=implicit-function-declaration] fd_install(listener, listener_f); ^~ fs_initcall cc1: some warnings being treated as errors vim +/seccomp_do_user_notification +891 kernel/seccomp.c 738 739 static void seccomp_do_user_notification(int this_syscall, 740 struct seccomp_filter *match, 741 const struct seccomp_data *sd) 742 { 743 int err; 744 long ret = 0; 745 struct seccomp_knotif n = {}; 746 747 mutex_lock(&match->notify_lock); 748 if (!match->has_listener) { 749 err = -ENOSYS; 750 goto out; 751 } 752 753 n.pid = current->pid; 754 n.state = SECCOMP_NOTIFY_INIT; 755 n.data = sd; 756 n.id = seccomp_next_notify_id(match); 757 init_completion(&n.ready); 758 759 list_add(&n.list, &match->notifications); 760 761 mutex_unlock(&match->notify_lock); 762 up(&match->request); 763 764 err = wait_for_completion_interruptible(&n.ready); 765 mutex_lock(&match->notify_lock); 766 767 /* 768 * Here it's possible we got a signal and then had to wait on the mutex 769 * while the reply was sent, so let's be sure there wasn't a response 770 * in the meantime. 771 */ 772 if (err < 0 && n.state != SECCOMP_NOTIFY_REPLIED) { 77
Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
On Fri, May 18, 2018 at 04:04:16PM +0200, Christian Brauner wrote: > On Thu, May 17, 2018 at 09:12:15AM -0600, Tycho Andersen wrote: > > +#ifdef CONFIG_SECCOMP_USER_NOTIFICATION > > +static u64 seccomp_next_notify_id(struct seccomp_filter *filter) > > +{ > > + u64 ret = filter->next_id; > > + > > + /* Note: overflow is ok here, the id just needs to be unique */ > > + filter->next_id++; > > + > > + return ret; > > +} > > Nit: Depending on how averse people are to relying on side-effects this > could be simplified to: > > static inline u64 seccomp_next_notify_id(struct seccomp_filter *filter) > { > /* Note: Overflow is ok. The id just needs to be unique. */ > return filter->next_id++; > } Oh, yes, definitely. I think this is leftover from when this function worked a different way. > > + > > +static void seccomp_do_user_notification(int this_syscall, > > +struct seccomp_filter *match, > > +const struct seccomp_data *sd) > > +{ > > + int err; > > + long ret = 0; > > + struct seccomp_knotif n = {}; > > + > > + mutex_lock(&match->notify_lock); > > + if (!match->has_listener) { > > + err = -ENOSYS; > > + goto out; > > + } > > Nit: > > err = -ENOSYS; > mutex_lock(&match->notify_lock); > if (!match->has_listener) > goto out; > > looks cleaner to me or you do the err initalization at the top of the > function. :) Ok :) > > + > > + n.pid = current->pid; > > + n.state = SECCOMP_NOTIFY_INIT; > > + n.data = sd; > > + n.id = seccomp_next_notify_id(match); > > + init_completion(&n.ready); > > + > > + list_add(&n.list, &match->notifications); > > + > > + mutex_unlock(&match->notify_lock); > > + up(&match->request); > > + > > + err = wait_for_completion_interruptible(&n.ready); > > + mutex_lock(&match->notify_lock); > > + > > + /* > > +* Here it's possible we got a signal and then had to wait on the mutex > > +* while the reply was sent, so let's be sure there wasn't a response > > +* in the meantime. > > +*/ > > + if (err < 0 && n.state != SECCOMP_NOTIFY_REPLIED) { > > + /* > > +* We got a signal. Let's tell userspace about it (potentially > > +* again, if we had already notified them about the first one). > > +*/ > > + if (n.state == SECCOMP_NOTIFY_SENT) { > > + n.state = SECCOMP_NOTIFY_INIT; > > + up(&match->request); > > + } > > + mutex_unlock(&match->notify_lock); > > + err = wait_for_completion_killable(&n.ready); > > + mutex_lock(&match->notify_lock); > > + if (err < 0) > > + goto remove_list; > > + } > > + > > + ret = n.val; > > + err = n.error; > > + > > + WARN(n.state != SECCOMP_NOTIFY_REPLIED, > > +"notified about write complete when state is not write"); > > Nit: That message seems a little cryptic. Perhaps we can just drop it. It's just a sanity check, but given the tests above, it doesn't seem likely. > > + > > +remove_list: > > + list_del(&n.list); > > +out: > > + mutex_unlock(&match->notify_lock); > > + syscall_set_return_value(current, task_pt_regs(current), > > +err, ret); > > +} > > +#else > > +static void seccomp_do_user_notification(int this_syscall, > > +u32 action, > > +struct seccomp_filter *match, > > +const struct seccomp_data *sd) > > +{ > > + WARN(1, "user notification received, but disabled"); > > Nit: "received unexpected user notification" might be clearer Yes, I wonder if we shouldn't just drop this too -- it's not a kernel bug, but a userspace bug that they're using features that aren't enabled. We could enhance the verifier with a static check for BPF_RET | BPF_K == SECCOMPO_RET_USER_NOTIF and reject such programs if user notification isn't enabled. Of course, it wouldn't handle the dynamic case, but it might be useful. Tycho
Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
On Thu, May 17, 2018 at 09:12:15AM -0600, Tycho Andersen wrote: > This patch introduces a means for syscalls matched in seccomp to notify > some other task that a particular filter has been triggered. > > The motivation for this is primarily for use with containers. For example, > if a container does an init_module(), we obviously don't want to load this > untrusted code, which may be compiled for the wrong version of the kernel > anyway. Instead, we could parse the module image, figure out which module > the container is trying to load and load it on the host. > > As another example, containers cannot mknod(), since this checks > capable(CAP_SYS_ADMIN). However, harmless devices like /dev/null or > /dev/zero should be ok for containers to mknod, but we'd like to avoid hard > coding some whitelist in the kernel. Another example is mount(), which has > many security restrictions for good reason, but configuration or runtime > knowledge could potentially be used to relax these restrictions. > > This patch adds functionality that is already possible via at least two > other means that I know about, both of which involve ptrace(): first, one > could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL. > Unfortunately this is slow, so a faster version would be to install a > filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP. > Since ptrace allows only one tracer, if the container runtime is that > tracer, users inside the container (or outside) trying to debug it will not > be able to use ptrace, which is annoying. It also means that older > distributions based on Upstart cannot boot inside containers using ptrace, > since upstart itself uses ptrace to start services. > > The actual implementation of this is fairly small, although getting the > synchronization right was/is slightly complex. > > Finally, it's worth noting that the classic seccomp TOCTOU of reading > memory data from the task still applies here, but can be avoided with > careful design of the userspace handler: if the userspace handler reads all > of the task memory that is necessary before applying its security policy, > the tracee's subsequent memory edits will not be read by the tracer. > > v2: * make id a u64; the idea here being that it will never overflow, > because 64 is huge (one syscall every nanosecond => wrap every 584 > years) > * prevent nesting of user notifications: if someone is already attached > the tree in one place, nobody else can attach to the tree > * notify the listener of signals the tracee receives as well > * implement poll > > Signed-off-by: Tycho Andersen > CC: Kees Cook > CC: Andy Lutomirski > CC: Oleg Nesterov > CC: Eric W. Biederman > CC: "Serge E. Hallyn" > CC: Christian Brauner > CC: Tyler Hicks > CC: Akihiro Suda > --- > arch/Kconfig | 7 + > include/linux/seccomp.h | 3 +- > include/uapi/linux/seccomp.h | 18 +- > kernel/seccomp.c | 402 +- > tools/testing/selftests/seccomp/seccomp_bpf.c | 181 +++- > 5 files changed, 605 insertions(+), 6 deletions(-) > > diff --git a/arch/Kconfig b/arch/Kconfig > index 8e0d665c8d53..dd99eef3e049 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -401,6 +401,13 @@ config SECCOMP_FILTER > > See Documentation/prctl/seccomp_filter.txt for details. > > +config SECCOMP_USER_NOTIFICATION > + bool "Enable the SECCOMP_RET_USER_NOTIF seccomp action" > + depends on SECCOMP_FILTER > + help > + Enable SECCOMP_RET_USER_NOTIF, a return code which can be used by > seccomp > + programs to notify a userspace listener that a particular event > happened. > + > config HAVE_GCC_PLUGINS > bool > help > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h > index c723a5c4e3ff..0fd3e0676a1c 100644 > --- a/include/linux/seccomp.h > +++ b/include/linux/seccomp.h > @@ -5,7 +5,8 @@ > #include > > #define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \ > - SECCOMP_FILTER_FLAG_LOG) > + SECCOMP_FILTER_FLAG_LOG | \ > + SECCOMP_FILTER_FLAG_GET_LISTENER) > > #ifdef CONFIG_SECCOMP > > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h > index 2a0bd9dd104d..8160e6cad528 100644 > --- a/include/uapi/linux/seccomp.h > +++ b/include/uapi/linux/seccomp.h > @@ -17,8 +17,9 @@ > #define SECCOMP_GET_ACTION_AVAIL 2 > > /* Valid flags for SECCOMP_SET_MODE_FILTER */ > -#define SECCOMP_FILTER_FLAG_TSYNC1 > -#define SECCOMP_FILTER_FLAG_LOG 2 > +#define SECCOMP_FILTER_FLAG_TSYNC1 > +#define SECCOMP_FILTER_FLAG_LOG 2 > +#define SECCOMP_FILTER_FLAG_GET_LISTENER 4 > > /* > * All BPF programs must return a 32-bit value. > @@ -34,6 +35,7 @@ >
Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
On 05/17, Tycho Andersen wrote: > > > From lockdep pov this loop tries to take the same lock twice or more, it > > shoul > > complain. > > I didn't, but I guess that's because it's not trying to take the same lock > twice -- the pointer cur is changing in the loop. Yes, I see. But this is the same lock for lockdep, it has the same class. Oleg.
Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
Hi Oleg, Thanks for taking a look! On Thu, May 17, 2018 at 05:33:24PM +0200, Oleg Nesterov wrote: > I didn't read this series yet, and I don't even understand what are you > trying to do, just one question... > > On 05/17, Tycho Andersen wrote: > > > > +static struct file *init_listener(struct task_struct *task, > > + struct seccomp_filter *filter) > > +{ > > + struct file *ret = ERR_PTR(-EBUSY); > > + struct seccomp_filter *cur; > > + bool have_listener = false; > > + > > + for (cur = task->seccomp.filter; cur; cur = cur->prev) { > > + mutex_lock(&cur->notify_lock); > > Did you test this patch with CONFIG_LOCKDEP ? Yes, with, CONFIG_LOCKDEP=y CONFIG_DEBUG_LOCKDEP=y CONFIG_DEBUG_ATOMIC_SLEEP=y > From lockdep pov this loop tries to take the same lock twice or more, it shoul > complain. I didn't, but I guess that's because it's not trying to take the same lock twice -- the pointer cur is changing in the loop. Unless I'm misunderstanding what you're saying. The idea behind this code is to lock the entire chain of filters up to the parent so that we can ensure none of them have a listener installed. This is based on a suggestion from Andy last review cycle to not allow two listeners, since nesting would be confusing. Tycho
Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
I didn't read this series yet, and I don't even understand what are you trying to do, just one question... On 05/17, Tycho Andersen wrote: > > +static struct file *init_listener(struct task_struct *task, > + struct seccomp_filter *filter) > +{ > + struct file *ret = ERR_PTR(-EBUSY); > + struct seccomp_filter *cur; > + bool have_listener = false; > + > + for (cur = task->seccomp.filter; cur; cur = cur->prev) { > + mutex_lock(&cur->notify_lock); Did you test this patch with CONFIG_LOCKDEP ? >From lockdep pov this loop tries to take the same lock twice or more, it shoul complain. Oleg.