Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace

2018-05-24 Thread Tycho Andersen
Hi Oleg,

On Thu, May 17, 2018 at 05:46:37PM +0200, Oleg Nesterov wrote:
> On 05/17, Tycho Andersen wrote:
> >
> > > From lockdep pov this loop tries to take the same lock twice or more, it 
> > > shoul
> > > complain.
> >
> > I didn't, but I guess that's because it's not trying to take the same lock
> > twice -- the pointer cur is changing in the loop.
> 
> Yes, I see. But this is the same lock for lockdep, it has the same class.

I finally figured this out, I needed CONFIG_PROVE_LOCKING=y too,
anyway, I've added the nesting annotations for v3. Thanks!

Tycho


Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace

2018-05-21 Thread Tycho Andersen
On Sat, May 19, 2018 at 01:01:15PM +0800, kbuild test robot wrote:
> Hi Tycho,
> 
> I love your patch! Yet something to improve:

Whoops, seems I forgot to compile the
!CONFIG_SECCOMP_USER_NOTIFICATION case. Anyways, I've fixed this for
v3.

Tycho


Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace

2018-05-18 Thread kbuild test robot
Hi Tycho,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17-rc5]
[cannot apply to next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Tycho-Andersen/seccomp-trap-to-userspace/20180519-071527
config: i386-randconfig-a1-05181545 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   kernel/seccomp.c: In function '__seccomp_filter':
   kernel/seccomp.c:891:46: warning: passing argument 2 of 
'seccomp_do_user_notification' makes integer from pointer without a cast
  seccomp_do_user_notification(this_syscall, match, sd);
 ^
   kernel/seccomp.c:802:13: note: expected 'u32' but argument is of type 
'struct seccomp_filter *'
static void seccomp_do_user_notification(int this_syscall,
^
>> kernel/seccomp.c:891:53: warning: passing argument 3 of 
>> 'seccomp_do_user_notification' from incompatible pointer type
  seccomp_do_user_notification(this_syscall, match, sd);
^
   kernel/seccomp.c:802:13: note: expected 'struct seccomp_filter *' but 
argument is of type 'const struct seccomp_data *'
static void seccomp_do_user_notification(int this_syscall,
^
   kernel/seccomp.c:891:3: error: too few arguments to function 
'seccomp_do_user_notification'
  seccomp_do_user_notification(this_syscall, match, sd);
  ^
   kernel/seccomp.c:802:13: note: declared here
static void seccomp_do_user_notification(int this_syscall,
^
   kernel/seccomp.c: In function 'seccomp_set_mode_filter':
>> kernel/seccomp.c:1036:3: error: implicit declaration of function 
>> 'get_unused_fd_flags' [-Werror=implicit-function-declaration]
  listener = get_unused_fd_flags(O_RDWR);
  ^
>> kernel/seccomp.c:1042:3: error: implicit declaration of function 
>> 'init_listener' [-Werror=implicit-function-declaration]
  listener_f = init_listener(current, prepared);
  ^
   kernel/seccomp.c:1042:14: warning: assignment makes pointer from integer 
without a cast
  listener_f = init_listener(current, prepared);
 ^
>> kernel/seccomp.c:1044:4: error: implicit declaration of function 
>> 'put_unused_fd' [-Werror=implicit-function-declaration]
   put_unused_fd(listener);
   ^
>> kernel/seccomp.c:1077:4: error: implicit declaration of function 'fput' 
>> [-Werror=implicit-function-declaration]
   fput(listener_f);
   ^
>> kernel/seccomp.c:1080:4: error: implicit declaration of function 
>> 'fd_install' [-Werror=implicit-function-declaration]
   fd_install(listener, listener_f);
   ^
   cc1: some warnings being treated as errors

vim +/get_unused_fd_flags +1036 kernel/seccomp.c

   812  
   813  #ifdef CONFIG_SECCOMP_FILTER
   814  static int __seccomp_filter(int this_syscall, const struct seccomp_data 
*sd,
   815  const bool recheck_after_trace)
   816  {
   817  u32 filter_ret, action;
   818  struct seccomp_filter *match = NULL;
   819  int data;
   820  
   821  /*
   822   * Make sure that any changes to mode from another thread have
   823   * been seen after TIF_SECCOMP was seen.
   824   */
   825  rmb();
   826  
   827  filter_ret = seccomp_run_filters(sd, &match);
   828  data = filter_ret & SECCOMP_RET_DATA;
   829  action = filter_ret & SECCOMP_RET_ACTION_FULL;
   830  
   831  switch (action) {
   832  case SECCOMP_RET_ERRNO:
   833  /* Set low-order bits as an errno, capped at MAX_ERRNO. 
*/
   834  if (data > MAX_ERRNO)
   835  data = MAX_ERRNO;
   836  syscall_set_return_value(current, task_pt_regs(current),
   837   -data, 0);
   838  goto skip;
   839  
   840  case SECCOMP_RET_TRAP:
   841  /* Show the handler the original registers. */
   842  syscall_rollback(current, task_pt_regs(current));
   843  /* Let the filter pass back 16 bits of data. */
   844  seccomp_send_sigsys(this_syscall, data);
   845  goto skip;
   846  
   847  case SECCOMP_RET_TRACE:
   848  /* We've been put in this state by the ptracer already. 
*/
   849  if (recheck_after_trace)
   850  return 0;
   851  
   852  /* ENOSYS these calls if there is no tracer attached. */
   853  if (!ptrace_event_enabled(current, 
PTRACE_EVENT_SECCOMP)) {
   854  syscall_set

Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace

2018-05-18 Thread kbuild test robot
Hi Tycho,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17-rc5]
[cannot apply to next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Tycho-Andersen/seccomp-trap-to-userspace/20180519-071527
config: x86_64-randconfig-x010-201819 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   kernel/seccomp.c: In function '__seccomp_filter':
>> kernel/seccomp.c:891:46: warning: passing argument 2 of 
>> 'seccomp_do_user_notification' makes integer from pointer without a cast 
>> [-Wint-conversion]
  seccomp_do_user_notification(this_syscall, match, sd);
 ^
   kernel/seccomp.c:802:13: note: expected 'u32 {aka unsigned int}' but 
argument is of type 'struct seccomp_filter *'
static void seccomp_do_user_notification(int this_syscall,
^~~~
>> kernel/seccomp.c:891:53: error: passing argument 3 of 
>> 'seccomp_do_user_notification' from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
  seccomp_do_user_notification(this_syscall, match, sd);
^~
   kernel/seccomp.c:802:13: note: expected 'struct seccomp_filter *' but 
argument is of type 'const struct seccomp_data *'
static void seccomp_do_user_notification(int this_syscall,
^~~~
>> kernel/seccomp.c:891:3: error: too few arguments to function 
>> 'seccomp_do_user_notification'
  seccomp_do_user_notification(this_syscall, match, sd);
  ^~~~
   kernel/seccomp.c:802:13: note: declared here
static void seccomp_do_user_notification(int this_syscall,
^~~~
   kernel/seccomp.c: In function 'seccomp_set_mode_filter':
>> kernel/seccomp.c:1036:14: error: implicit declaration of function 
>> 'get_unused_fd_flags'; did you mean 'getname_flags'? 
>> [-Werror=implicit-function-declaration]
  listener = get_unused_fd_flags(O_RDWR);
 ^~~
 getname_flags
>> kernel/seccomp.c:1042:16: error: implicit declaration of function 
>> 'init_listener'; did you mean 'init_llist_head'? 
>> [-Werror=implicit-function-declaration]
  listener_f = init_listener(current, prepared);
   ^
   init_llist_head
>> kernel/seccomp.c:1042:14: warning: assignment makes pointer from integer 
>> without a cast [-Wint-conversion]
  listener_f = init_listener(current, prepared);
 ^
>> kernel/seccomp.c:1044:4: error: implicit declaration of function 
>> 'put_unused_fd'; did you mean 'put_user_ns'? 
>> [-Werror=implicit-function-declaration]
   put_unused_fd(listener);
   ^
   put_user_ns
>> kernel/seccomp.c:1077:4: error: implicit declaration of function 'fput'; did 
>> you mean 'iput'? [-Werror=implicit-function-declaration]
   fput(listener_f);
   ^~~~
   iput
>> kernel/seccomp.c:1080:4: error: implicit declaration of function 
>> 'fd_install'; did you mean 'fs_initcall'? 
>> [-Werror=implicit-function-declaration]
   fd_install(listener, listener_f);
   ^~
   fs_initcall
   cc1: some warnings being treated as errors

vim +/seccomp_do_user_notification +891 kernel/seccomp.c

   738  
   739  static void seccomp_do_user_notification(int this_syscall,
   740   struct seccomp_filter *match,
   741   const struct seccomp_data *sd)
   742  {
   743  int err;
   744  long ret = 0;
   745  struct seccomp_knotif n = {};
   746  
   747  mutex_lock(&match->notify_lock);
   748  if (!match->has_listener) {
   749  err = -ENOSYS;
   750  goto out;
   751  }
   752  
   753  n.pid = current->pid;
   754  n.state = SECCOMP_NOTIFY_INIT;
   755  n.data = sd;
   756  n.id = seccomp_next_notify_id(match);
   757  init_completion(&n.ready);
   758  
   759  list_add(&n.list, &match->notifications);
   760  
   761  mutex_unlock(&match->notify_lock);
   762  up(&match->request);
   763  
   764  err = wait_for_completion_interruptible(&n.ready);
   765  mutex_lock(&match->notify_lock);
   766  
   767  /*
   768   * Here it's possible we got a signal and then had to wait on 
the mutex
   769   * while the reply was sent, so let's be sure there wasn't a 
response
   770   * in the meantime.
   771   */
   772  if (err < 0 && n.state != SECCOMP_NOTIFY_REPLIED) {
   77

Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace

2018-05-18 Thread Tycho Andersen
On Fri, May 18, 2018 at 04:04:16PM +0200, Christian Brauner wrote:
> On Thu, May 17, 2018 at 09:12:15AM -0600, Tycho Andersen wrote:
> > +#ifdef CONFIG_SECCOMP_USER_NOTIFICATION
> > +static u64 seccomp_next_notify_id(struct seccomp_filter *filter)
> > +{
> > +   u64 ret = filter->next_id;
> > +
> > +   /* Note: overflow is ok here, the id just needs to be unique */
> > +   filter->next_id++;
> > +
> > +   return ret;
> > +}
> 
> Nit: Depending on how averse people are to relying on side-effects this
> could be simplified to:
> 
> static inline u64 seccomp_next_notify_id(struct seccomp_filter *filter)
> {
> /* Note: Overflow is ok. The id just needs to be unique. */
> return filter->next_id++;
> }

Oh, yes, definitely. I think this is leftover from when this function
worked a different way.

> > +
> > +static void seccomp_do_user_notification(int this_syscall,
> > +struct seccomp_filter *match,
> > +const struct seccomp_data *sd)
> > +{
> > +   int err;
> > +   long ret = 0;
> > +   struct seccomp_knotif n = {};
> > +
> > +   mutex_lock(&match->notify_lock);
> > +   if (!match->has_listener) {
> > +   err = -ENOSYS;
> > +   goto out;
> > +   }
> 
> Nit:
> 
> err = -ENOSYS;
> mutex_lock(&match->notify_lock);
> if (!match->has_listener)
> goto out;
> 
> looks cleaner to me or you do the err initalization at the top of the
> function. :)

Ok :)

> > +
> > +   n.pid = current->pid;
> > +   n.state = SECCOMP_NOTIFY_INIT;
> > +   n.data = sd;
> > +   n.id = seccomp_next_notify_id(match);
> > +   init_completion(&n.ready);
> > +
> > +   list_add(&n.list, &match->notifications);
> > +
> > +   mutex_unlock(&match->notify_lock);
> > +   up(&match->request);
> > +
> > +   err = wait_for_completion_interruptible(&n.ready);
> > +   mutex_lock(&match->notify_lock);
> > +
> > +   /*
> > +* Here it's possible we got a signal and then had to wait on the mutex
> > +* while the reply was sent, so let's be sure there wasn't a response
> > +* in the meantime.
> > +*/
> > +   if (err < 0 && n.state != SECCOMP_NOTIFY_REPLIED) {
> > +   /*
> > +* We got a signal. Let's tell userspace about it (potentially
> > +* again, if we had already notified them about the first one).
> > +*/
> > +   if (n.state == SECCOMP_NOTIFY_SENT) {
> > +   n.state = SECCOMP_NOTIFY_INIT;
> > +   up(&match->request);
> > +   }
> > +   mutex_unlock(&match->notify_lock);
> > +   err = wait_for_completion_killable(&n.ready);
> > +   mutex_lock(&match->notify_lock);
> > +   if (err < 0)
> > +   goto remove_list;
> > +   }
> > +
> > +   ret = n.val;
> > +   err = n.error;
> > +
> > +   WARN(n.state != SECCOMP_NOTIFY_REPLIED,
> > +"notified about write complete when state is not write");
> 
> Nit: That message seems a little cryptic.

Perhaps we can just drop it. It's just a sanity check, but given the
tests above, it doesn't seem likely.

> > +
> > +remove_list:
> > +   list_del(&n.list);
> > +out:
> > +   mutex_unlock(&match->notify_lock);
> > +   syscall_set_return_value(current, task_pt_regs(current),
> > +err, ret);
> > +}
> > +#else
> > +static void seccomp_do_user_notification(int this_syscall,
> > +u32 action,
> > +struct seccomp_filter *match,
> > +const struct seccomp_data *sd)
> > +{
> > +   WARN(1, "user notification received, but disabled");
> 
> Nit: "received unexpected user notification" might be clearer

Yes, I wonder if we shouldn't just drop this too -- it's not a kernel
bug, but a userspace bug that they're using features that aren't
enabled.

We could enhance the verifier with a static check for
BPF_RET | BPF_K == SECCOMPO_RET_USER_NOTIF and reject such programs if
user notification isn't enabled. Of course, it wouldn't handle the
dynamic case, but it might be useful.

Tycho


Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace

2018-05-18 Thread Christian Brauner
On Thu, May 17, 2018 at 09:12:15AM -0600, Tycho Andersen wrote:
> This patch introduces a means for syscalls matched in seccomp to notify
> some other task that a particular filter has been triggered.
> 
> The motivation for this is primarily for use with containers. For example,
> if a container does an init_module(), we obviously don't want to load this
> untrusted code, which may be compiled for the wrong version of the kernel
> anyway. Instead, we could parse the module image, figure out which module
> the container is trying to load and load it on the host.
> 
> As another example, containers cannot mknod(), since this checks
> capable(CAP_SYS_ADMIN). However, harmless devices like /dev/null or
> /dev/zero should be ok for containers to mknod, but we'd like to avoid hard
> coding some whitelist in the kernel. Another example is mount(), which has
> many security restrictions for good reason, but configuration or runtime
> knowledge could potentially be used to relax these restrictions.
> 
> This patch adds functionality that is already possible via at least two
> other means that I know about, both of which involve ptrace(): first, one
> could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL.
> Unfortunately this is slow, so a faster version would be to install a
> filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP.
> Since ptrace allows only one tracer, if the container runtime is that
> tracer, users inside the container (or outside) trying to debug it will not
> be able to use ptrace, which is annoying. It also means that older
> distributions based on Upstart cannot boot inside containers using ptrace,
> since upstart itself uses ptrace to start services.
> 
> The actual implementation of this is fairly small, although getting the
> synchronization right was/is slightly complex.
> 
> Finally, it's worth noting that the classic seccomp TOCTOU of reading
> memory data from the task still applies here, but can be avoided with
> careful design of the userspace handler: if the userspace handler reads all
> of the task memory that is necessary before applying its security policy,
> the tracee's subsequent memory edits will not be read by the tracer.
> 
> v2: * make id a u64; the idea here being that it will never overflow,
>   because 64 is huge (one syscall every nanosecond => wrap every 584
>   years)
> * prevent nesting of user notifications: if someone is already attached
>   the tree in one place, nobody else can attach to the tree
> * notify the listener of signals the tracee receives as well
> * implement poll
> 
> Signed-off-by: Tycho Andersen 
> CC: Kees Cook 
> CC: Andy Lutomirski 
> CC: Oleg Nesterov 
> CC: Eric W. Biederman 
> CC: "Serge E. Hallyn" 
> CC: Christian Brauner 
> CC: Tyler Hicks 
> CC: Akihiro Suda 
> ---
>  arch/Kconfig  |   7 +
>  include/linux/seccomp.h   |   3 +-
>  include/uapi/linux/seccomp.h  |  18 +-
>  kernel/seccomp.c  | 402 +-
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 181 +++-
>  5 files changed, 605 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 8e0d665c8d53..dd99eef3e049 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -401,6 +401,13 @@ config SECCOMP_FILTER
>  
> See Documentation/prctl/seccomp_filter.txt for details.
>  
> +config SECCOMP_USER_NOTIFICATION
> + bool "Enable the SECCOMP_RET_USER_NOTIF seccomp action"
> + depends on SECCOMP_FILTER
> + help
> +   Enable SECCOMP_RET_USER_NOTIF, a return code which can be used by 
> seccomp
> +   programs to notify a userspace listener that a particular event 
> happened.
> +
>  config HAVE_GCC_PLUGINS
>   bool
>   help
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index c723a5c4e3ff..0fd3e0676a1c 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -5,7 +5,8 @@
>  #include 
>  
>  #define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \
> -  SECCOMP_FILTER_FLAG_LOG)
> +  SECCOMP_FILTER_FLAG_LOG | \
> +  SECCOMP_FILTER_FLAG_GET_LISTENER)
>  
>  #ifdef CONFIG_SECCOMP
>  
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 2a0bd9dd104d..8160e6cad528 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -17,8 +17,9 @@
>  #define SECCOMP_GET_ACTION_AVAIL 2
>  
>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
> -#define SECCOMP_FILTER_FLAG_TSYNC1
> -#define SECCOMP_FILTER_FLAG_LOG  2
> +#define SECCOMP_FILTER_FLAG_TSYNC1
> +#define SECCOMP_FILTER_FLAG_LOG  2
> +#define SECCOMP_FILTER_FLAG_GET_LISTENER 4
>  
>  /*
>   * All BPF programs must return a 32-bit value.
> @@ -34,6 +35,7 @@
>

Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace

2018-05-17 Thread Oleg Nesterov
On 05/17, Tycho Andersen wrote:
>
> > From lockdep pov this loop tries to take the same lock twice or more, it 
> > shoul
> > complain.
>
> I didn't, but I guess that's because it's not trying to take the same lock
> twice -- the pointer cur is changing in the loop.

Yes, I see. But this is the same lock for lockdep, it has the same class.

Oleg.



Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace

2018-05-17 Thread Tycho Andersen
Hi Oleg,

Thanks for taking a look!

On Thu, May 17, 2018 at 05:33:24PM +0200, Oleg Nesterov wrote:
> I didn't read this series yet, and I don't even understand what are you
> trying to do, just one question...
> 
> On 05/17, Tycho Andersen wrote:
> >
> > +static struct file *init_listener(struct task_struct *task,
> > + struct seccomp_filter *filter)
> > +{
> > +   struct file *ret = ERR_PTR(-EBUSY);
> > +   struct seccomp_filter *cur;
> > +   bool have_listener = false;
> > +
> > +   for (cur = task->seccomp.filter; cur; cur = cur->prev) {
> > +   mutex_lock(&cur->notify_lock);
> 
> Did you test this patch with CONFIG_LOCKDEP ?

Yes, with,

CONFIG_LOCKDEP=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y

> From lockdep pov this loop tries to take the same lock twice or more, it shoul
> complain.

I didn't, but I guess that's because it's not trying to take the same lock
twice -- the pointer cur is changing in the loop. Unless I'm misunderstanding
what you're saying.

The idea behind this code is to lock the entire chain of filters up to the
parent so that we can ensure none of them have a listener installed. This is
based on a suggestion from Andy last review cycle to not allow two listeners,
since nesting would be confusing.

Tycho


Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace

2018-05-17 Thread Oleg Nesterov
I didn't read this series yet, and I don't even understand what are you
trying to do, just one question...

On 05/17, Tycho Andersen wrote:
>
> +static struct file *init_listener(struct task_struct *task,
> +   struct seccomp_filter *filter)
> +{
> + struct file *ret = ERR_PTR(-EBUSY);
> + struct seccomp_filter *cur;
> + bool have_listener = false;
> +
> + for (cur = task->seccomp.filter; cur; cur = cur->prev) {
> + mutex_lock(&cur->notify_lock);

Did you test this patch with CONFIG_LOCKDEP ?

>From lockdep pov this loop tries to take the same lock twice or more, it shoul
complain.

Oleg.