Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-13 Thread Will Drewry
On Wed, Jun 11, 2014 at 5:32 PM, Kees Cook  wrote:
> On Wed, Jun 11, 2014 at 3:28 PM, Andy Lutomirski  wrote:
>> On Wed, Jun 11, 2014 at 3:27 PM, H. Peter Anvin  wrote:
>>> On 06/11/2014 03:22 PM, Andy Lutomirski wrote:
 On Wed, Jun 11, 2014 at 3:18 PM, H. Peter Anvin  wrote:
> On 06/11/2014 02:56 PM, Andy Lutomirski wrote:
>>
>> 13ns is with the simplest nonempty filter.  I hope that empty filters
>> don't work.
>>
>
> Why wouldn't they?

 Is it permissible to fall off the end of a BPF program?  I'm getting
 EINVAL trying to install an actual empty filter.  The filter I tested
 with was:

>>>
>>> What I meant was that there has to be a well-defined behavior for the
>>> program falling off the end anyway, and that that should be preserved.
>>>
>>> I guess it is possible to require that all code paths must provably
>>> reach a termination point.
>>>
>>
>> Dunno.  I haven't ever touched any of the actual BPF code.  This whole
>> patchset only changes the code that invokes the BPF evaluator.
>
> Yes, this is how BPF works: runs to the end or exit early. With
> seccomp BPF specifically, the return value defaults to kill the
> process. If a filter was missing (NULL), or empty, or didn't
> explicitly return with a new value, the default (kill) should be
> taken.

Yup - this is just a property of BPF (and a nice one :)

On seccomp_attach_filter this check fires:
  if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
return -EINVAL;

As well as in sk_chk_filter:
  if (flen == 0 || flen > BPF_MAXINSNS)
return -EINVAL;

And:
  /* last instruction must be a RET code */
  switch (filter[flen - 1].code) {
case BPF_S_RET_K:
case BPF_S_RET_A:
  return check_load_and_stores(filter, flen);
  }

cheers!
will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-13 Thread Will Drewry
On Wed, Jun 11, 2014 at 5:32 PM, Kees Cook keesc...@chromium.org wrote:
 On Wed, Jun 11, 2014 at 3:28 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Wed, Jun 11, 2014 at 3:27 PM, H. Peter Anvin h...@zytor.com wrote:
 On 06/11/2014 03:22 PM, Andy Lutomirski wrote:
 On Wed, Jun 11, 2014 at 3:18 PM, H. Peter Anvin h...@zytor.com wrote:
 On 06/11/2014 02:56 PM, Andy Lutomirski wrote:

 13ns is with the simplest nonempty filter.  I hope that empty filters
 don't work.


 Why wouldn't they?

 Is it permissible to fall off the end of a BPF program?  I'm getting
 EINVAL trying to install an actual empty filter.  The filter I tested
 with was:


 What I meant was that there has to be a well-defined behavior for the
 program falling off the end anyway, and that that should be preserved.

 I guess it is possible to require that all code paths must provably
 reach a termination point.


 Dunno.  I haven't ever touched any of the actual BPF code.  This whole
 patchset only changes the code that invokes the BPF evaluator.

 Yes, this is how BPF works: runs to the end or exit early. With
 seccomp BPF specifically, the return value defaults to kill the
 process. If a filter was missing (NULL), or empty, or didn't
 explicitly return with a new value, the default (kill) should be
 taken.

Yup - this is just a property of BPF (and a nice one :)

On seccomp_attach_filter this check fires:
  if (fprog-len == 0 || fprog-len  BPF_MAXINSNS)
return -EINVAL;

As well as in sk_chk_filter:
  if (flen == 0 || flen  BPF_MAXINSNS)
return -EINVAL;

And:
  /* last instruction must be a RET code */
  switch (filter[flen - 1].code) {
case BPF_S_RET_K:
case BPF_S_RET_A:
  return check_load_and_stores(filter, flen);
  }

cheers!
will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread Kees Cook
On Wed, Jun 11, 2014 at 3:28 PM, Andy Lutomirski  wrote:
> On Wed, Jun 11, 2014 at 3:27 PM, H. Peter Anvin  wrote:
>> On 06/11/2014 03:22 PM, Andy Lutomirski wrote:
>>> On Wed, Jun 11, 2014 at 3:18 PM, H. Peter Anvin  wrote:
 On 06/11/2014 02:56 PM, Andy Lutomirski wrote:
>
> 13ns is with the simplest nonempty filter.  I hope that empty filters
> don't work.
>

 Why wouldn't they?
>>>
>>> Is it permissible to fall off the end of a BPF program?  I'm getting
>>> EINVAL trying to install an actual empty filter.  The filter I tested
>>> with was:
>>>
>>
>> What I meant was that there has to be a well-defined behavior for the
>> program falling off the end anyway, and that that should be preserved.
>>
>> I guess it is possible to require that all code paths must provably
>> reach a termination point.
>>
>
> Dunno.  I haven't ever touched any of the actual BPF code.  This whole
> patchset only changes the code that invokes the BPF evaluator.

Yes, this is how BPF works: runs to the end or exit early. With
seccomp BPF specifically, the return value defaults to kill the
process. If a filter was missing (NULL), or empty, or didn't
explicitly return with a new value, the default (kill) should be
taken.

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread Andy Lutomirski
On Wed, Jun 11, 2014 at 3:27 PM, H. Peter Anvin  wrote:
> On 06/11/2014 03:22 PM, Andy Lutomirski wrote:
>> On Wed, Jun 11, 2014 at 3:18 PM, H. Peter Anvin  wrote:
>>> On 06/11/2014 02:56 PM, Andy Lutomirski wrote:

 13ns is with the simplest nonempty filter.  I hope that empty filters
 don't work.

>>>
>>> Why wouldn't they?
>>
>> Is it permissible to fall off the end of a BPF program?  I'm getting
>> EINVAL trying to install an actual empty filter.  The filter I tested
>> with was:
>>
>
> What I meant was that there has to be a well-defined behavior for the
> program falling off the end anyway, and that that should be preserved.
>
> I guess it is possible to require that all code paths must provably
> reach a termination point.
>

Dunno.  I haven't ever touched any of the actual BPF code.  This whole
patchset only changes the code that invokes the BPF evaluator.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread H. Peter Anvin
On 06/11/2014 03:22 PM, Andy Lutomirski wrote:
> On Wed, Jun 11, 2014 at 3:18 PM, H. Peter Anvin  wrote:
>> On 06/11/2014 02:56 PM, Andy Lutomirski wrote:
>>>
>>> 13ns is with the simplest nonempty filter.  I hope that empty filters
>>> don't work.
>>>
>>
>> Why wouldn't they?
> 
> Is it permissible to fall off the end of a BPF program?  I'm getting
> EINVAL trying to install an actual empty filter.  The filter I tested
> with was:
> 

What I meant was that there has to be a well-defined behavior for the
program falling off the end anyway, and that that should be preserved.

I guess it is possible to require that all code paths must provably
reach a termination point.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread Andy Lutomirski
On Wed, Jun 11, 2014 at 3:18 PM, H. Peter Anvin  wrote:
> On 06/11/2014 02:56 PM, Andy Lutomirski wrote:
>>
>> 13ns is with the simplest nonempty filter.  I hope that empty filters
>> don't work.
>>
>
> Why wouldn't they?

Is it permissible to fall off the end of a BPF program?  I'm getting
EINVAL trying to install an actual empty filter.  The filter I tested
with was:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
int rc;

struct sock_filter filter[] = {
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
};

struct sock_fprog prog = {
.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
.filter = filter,
};

if (argc < 2) {
printf("Usage: null_seccomp PATH ARGS...\n");
return 1;
}

if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0))
err(1, "PR_SET_NO_NEW_PRIVS");
if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, ))
err(1, "PR_SET_SECCOMP");

execv(argv[1], argv + 1);
err(1, argv[1]);
}


--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread H. Peter Anvin
On 06/11/2014 02:56 PM, Andy Lutomirski wrote:
> 
> 13ns is with the simplest nonempty filter.  I hope that empty filters
> don't work.
> 

Why wouldn't they?

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread Andy Lutomirski
On Wed, Jun 11, 2014 at 2:29 PM, Alexei Starovoitov
 wrote:
> On Wed, Jun 11, 2014 at 1:23 PM, Andy Lutomirski  wrote:
>> On my VM, getpid takes about 70ns.  Before this patch, adding a
>> single-instruction always-accept seccomp filter added about 134ns of
>> overhead to getpid.  With this patch, the overhead is down to about
>> 13ns.
>
> interesting.
> Is this the gain from patch 4 into patch 5 or from patch 0 to patch 5?
> 13ns is still with seccomp enabled, but without filters?

13ns is with the simplest nonempty filter.  I hope that empty filters
don't work.

>
>> I'm not really thrilled by this patch.  It has two main issues:
>>
>> 1. Calling into code in kernel/seccomp.c from assembly feels ugly.
>>
>> 2. The x86 64-bit syscall entry now has four separate code paths:
>> fast, seccomp only, audit only, and slow.  This kind of sucks.
>> Would it be worth trying to rewrite the whole thing in C with a
>> two-phase slow path approach like I'm using here for seccomp?
>>
>> Signed-off-by: Andy Lutomirski 
>> ---
>>  arch/x86/kernel/entry_64.S | 45 
>> +
>>  include/linux/seccomp.h|  4 ++--
>>  2 files changed, 47 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index f9e713a..feb32b2 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -683,6 +683,45 @@ sysret_signal:
>> FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
>> jmp int_check_syscall_exit_work
>>
>> +#ifdef CONFIG_SECCOMP
>> +   /*
>> +* Fast path for seccomp without any other slow path triggers.
>> +*/
>> +seccomp_fastpath:
>> +   /* Build seccomp_data */
>> +   pushq %r9   /* args[5] */
>> +   pushq %r8   /* args[4] */
>> +   pushq %r10  /* args[3] */
>> +   pushq %rdx  /* args[2] */
>> +   pushq %rsi  /* args[1] */
>> +   pushq %rdi  /* args[0] */
>> +   pushq RIP-ARGOFFSET+6*8(%rsp)   /* rip */

>> +   pushq %rax  /* nr and junk */
>> +   movl $AUDIT_ARCH_X86_64, 4(%rsp)/* arch */

It wouldn't shock me if this pair of instructions were
microarchitecturally bad.  Maybe I can save a few more cycles by using
bitwise arithmetic or a pair of movls and explicit rsp manipulation
here.  I haven't tried.

>> +   movq %rsp, %rdi
>> +   call seccomp_phase1
>
> the assembler code is pretty much repeating what C does in
> populate_seccomp_data(). Assuming the whole gain came from
> patch 5 why asm version is so much faster than C?
>
> it skips SAVE/RESTORE_REST... what else?
> If the most of the gain is from all patches combined (mainly from
> 2 phase approach) then why bother with asm?

The whole gain should be patch 5, but there are three things going on here.

The biggest win is skipping int_ret_from_sys_call.  IRET sucks.
There's some extra win from skipping SAVE/RESTORE_REST, but I haven't
benchmarked that.  I would guess it's on the order of 5ns.  In theory
a one-pass implementation could skip int_ret_from_sys_call, but that
will be awkward to implement correctly.

The other benefit is generating seccomp_data in assembly.  It saves
about 7ns.  This is likely due to avoiding all the indirection in
syscall_xyz and to avoiding prodding at flags to figure out the arch
token.

Fiddling with branch prediction made no difference that I could measure.

>
> Somehow it feels that the gain is due to better branch prediction
> in asm version. If you change few 'unlikely' in C to 'likely', it may
> get to the same performance...
>
> btw patches #1-3 look good to me. especially #1 is nice.

Thanks :)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread Alexei Starovoitov
On Wed, Jun 11, 2014 at 1:23 PM, Andy Lutomirski  wrote:
> On my VM, getpid takes about 70ns.  Before this patch, adding a
> single-instruction always-accept seccomp filter added about 134ns of
> overhead to getpid.  With this patch, the overhead is down to about
> 13ns.

interesting.
Is this the gain from patch 4 into patch 5 or from patch 0 to patch 5?
13ns is still with seccomp enabled, but without filters?

> I'm not really thrilled by this patch.  It has two main issues:
>
> 1. Calling into code in kernel/seccomp.c from assembly feels ugly.
>
> 2. The x86 64-bit syscall entry now has four separate code paths:
> fast, seccomp only, audit only, and slow.  This kind of sucks.
> Would it be worth trying to rewrite the whole thing in C with a
> two-phase slow path approach like I'm using here for seccomp?
>
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/kernel/entry_64.S | 45 +
>  include/linux/seccomp.h|  4 ++--
>  2 files changed, 47 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index f9e713a..feb32b2 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -683,6 +683,45 @@ sysret_signal:
> FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
> jmp int_check_syscall_exit_work
>
> +#ifdef CONFIG_SECCOMP
> +   /*
> +* Fast path for seccomp without any other slow path triggers.
> +*/
> +seccomp_fastpath:
> +   /* Build seccomp_data */
> +   pushq %r9   /* args[5] */
> +   pushq %r8   /* args[4] */
> +   pushq %r10  /* args[3] */
> +   pushq %rdx  /* args[2] */
> +   pushq %rsi  /* args[1] */
> +   pushq %rdi  /* args[0] */
> +   pushq RIP-ARGOFFSET+6*8(%rsp)   /* rip */
> +   pushq %rax  /* nr and junk */
> +   movl $AUDIT_ARCH_X86_64, 4(%rsp)/* arch */
> +   movq %rsp, %rdi
> +   call seccomp_phase1

the assembler code is pretty much repeating what C does in
populate_seccomp_data(). Assuming the whole gain came from
patch 5 why asm version is so much faster than C?
it skips SAVE/RESTORE_REST... what else?
If the most of the gain is from all patches combined (mainly from
2 phase approach) then why bother with asm?

Somehow it feels that the gain is due to better branch prediction
in asm version. If you change few 'unlikely' in C to 'likely', it may
get to the same performance...

btw patches #1-3 look good to me. especially #1 is nice.

> +   addq $8*8, %rsp
> +   cmpq $1, %rax
> +   ja seccomp_invoke_phase2
> +   LOAD_ARGS 0  /* restore clobbered regs */
> +   jb system_call_fastpath
> +   jmp ret_from_sys_call
> +
> +seccomp_invoke_phase2:
> +   SAVE_REST
> +   FIXUP_TOP_OF_STACK %rdi
> +   movq %rax,%rdi
> +   call seccomp_phase2
> +
> +   /* if seccomp says to skip, then set orig_ax to -1 and skip */
> +   test %eax,%eax
> +   jz 1f
> +   movq $-1, ORIG_RAX(%rsp)
> +1:
> +   mov ORIG_RAX(%rsp), %rax/* reload rax */
> +   jmp system_call_post_trace  /* and maybe do the syscall */
> +#endif
> +
>  #ifdef CONFIG_AUDITSYSCALL
> /*
>  * Fast path for syscall audit without full syscall trace.
> @@ -717,6 +756,10 @@ sysret_audit:
>
> /* Do syscall tracing */
>  tracesys:
> +#ifdef CONFIG_SECCOMP
> +   testl $(_TIF_WORK_SYSCALL_ENTRY & 
> ~_TIF_SECCOMP),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> +   jz seccomp_fastpath
> +#endif
>  #ifdef CONFIG_AUDITSYSCALL
> testl $(_TIF_WORK_SYSCALL_ENTRY & 
> ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> jz auditsys
> @@ -725,6 +768,8 @@ tracesys:
> FIXUP_TOP_OF_STACK %rdi
> movq %rsp,%rdi
> call syscall_trace_enter
> +
> +system_call_post_trace:
> /*
>  * Reload arg registers from stack in case ptrace changed them.
>  * We don't reload %rax because syscall_trace_enter() returned
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index 4fc7a84..d3d4c52 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -37,8 +37,8 @@ static inline int secure_computing(void)
>  #define SECCOMP_PHASE1_OK  0
>  #define SECCOMP_PHASE1_SKIP1
>
> -extern u32 seccomp_phase1(struct seccomp_data *sd);
> -int seccomp_phase2(u32 phase1_result);
> +asmlinkage __visible extern u32 seccomp_phase1(struct seccomp_data *sd);
> +asmlinkage __visible int seccomp_phase2(u32 phase1_result);
>  #else
>  extern void secure_computing_strict(int this_syscall);
>  #endif
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  

Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread Alexei Starovoitov
On Wed, Jun 11, 2014 at 1:23 PM, Andy Lutomirski l...@amacapital.net wrote:
 On my VM, getpid takes about 70ns.  Before this patch, adding a
 single-instruction always-accept seccomp filter added about 134ns of
 overhead to getpid.  With this patch, the overhead is down to about
 13ns.

interesting.
Is this the gain from patch 4 into patch 5 or from patch 0 to patch 5?
13ns is still with seccomp enabled, but without filters?

 I'm not really thrilled by this patch.  It has two main issues:

 1. Calling into code in kernel/seccomp.c from assembly feels ugly.

 2. The x86 64-bit syscall entry now has four separate code paths:
 fast, seccomp only, audit only, and slow.  This kind of sucks.
 Would it be worth trying to rewrite the whole thing in C with a
 two-phase slow path approach like I'm using here for seccomp?

 Signed-off-by: Andy Lutomirski l...@amacapital.net
 ---
  arch/x86/kernel/entry_64.S | 45 +
  include/linux/seccomp.h|  4 ++--
  2 files changed, 47 insertions(+), 2 deletions(-)

 diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
 index f9e713a..feb32b2 100644
 --- a/arch/x86/kernel/entry_64.S
 +++ b/arch/x86/kernel/entry_64.S
 @@ -683,6 +683,45 @@ sysret_signal:
 FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
 jmp int_check_syscall_exit_work

 +#ifdef CONFIG_SECCOMP
 +   /*
 +* Fast path for seccomp without any other slow path triggers.
 +*/
 +seccomp_fastpath:
 +   /* Build seccomp_data */
 +   pushq %r9   /* args[5] */
 +   pushq %r8   /* args[4] */
 +   pushq %r10  /* args[3] */
 +   pushq %rdx  /* args[2] */
 +   pushq %rsi  /* args[1] */
 +   pushq %rdi  /* args[0] */
 +   pushq RIP-ARGOFFSET+6*8(%rsp)   /* rip */
 +   pushq %rax  /* nr and junk */
 +   movl $AUDIT_ARCH_X86_64, 4(%rsp)/* arch */
 +   movq %rsp, %rdi
 +   call seccomp_phase1

the assembler code is pretty much repeating what C does in
populate_seccomp_data(). Assuming the whole gain came from
patch 5 why asm version is so much faster than C?
it skips SAVE/RESTORE_REST... what else?
If the most of the gain is from all patches combined (mainly from
2 phase approach) then why bother with asm?

Somehow it feels that the gain is due to better branch prediction
in asm version. If you change few 'unlikely' in C to 'likely', it may
get to the same performance...

btw patches #1-3 look good to me. especially #1 is nice.

 +   addq $8*8, %rsp
 +   cmpq $1, %rax
 +   ja seccomp_invoke_phase2
 +   LOAD_ARGS 0  /* restore clobbered regs */
 +   jb system_call_fastpath
 +   jmp ret_from_sys_call
 +
 +seccomp_invoke_phase2:
 +   SAVE_REST
 +   FIXUP_TOP_OF_STACK %rdi
 +   movq %rax,%rdi
 +   call seccomp_phase2
 +
 +   /* if seccomp says to skip, then set orig_ax to -1 and skip */
 +   test %eax,%eax
 +   jz 1f
 +   movq $-1, ORIG_RAX(%rsp)
 +1:
 +   mov ORIG_RAX(%rsp), %rax/* reload rax */
 +   jmp system_call_post_trace  /* and maybe do the syscall */
 +#endif
 +
  #ifdef CONFIG_AUDITSYSCALL
 /*
  * Fast path for syscall audit without full syscall trace.
 @@ -717,6 +756,10 @@ sysret_audit:

 /* Do syscall tracing */
  tracesys:
 +#ifdef CONFIG_SECCOMP
 +   testl $(_TIF_WORK_SYSCALL_ENTRY  
 ~_TIF_SECCOMP),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
 +   jz seccomp_fastpath
 +#endif
  #ifdef CONFIG_AUDITSYSCALL
 testl $(_TIF_WORK_SYSCALL_ENTRY  
 ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
 jz auditsys
 @@ -725,6 +768,8 @@ tracesys:
 FIXUP_TOP_OF_STACK %rdi
 movq %rsp,%rdi
 call syscall_trace_enter
 +
 +system_call_post_trace:
 /*
  * Reload arg registers from stack in case ptrace changed them.
  * We don't reload %rax because syscall_trace_enter() returned
 diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
 index 4fc7a84..d3d4c52 100644
 --- a/include/linux/seccomp.h
 +++ b/include/linux/seccomp.h
 @@ -37,8 +37,8 @@ static inline int secure_computing(void)
  #define SECCOMP_PHASE1_OK  0
  #define SECCOMP_PHASE1_SKIP1

 -extern u32 seccomp_phase1(struct seccomp_data *sd);
 -int seccomp_phase2(u32 phase1_result);
 +asmlinkage __visible extern u32 seccomp_phase1(struct seccomp_data *sd);
 +asmlinkage __visible int seccomp_phase2(u32 phase1_result);
  #else
  extern void secure_computing_strict(int this_syscall);
  #endif
 --
 1.9.3

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  

Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread Andy Lutomirski
On Wed, Jun 11, 2014 at 2:29 PM, Alexei Starovoitov
alexei.starovoi...@gmail.com wrote:
 On Wed, Jun 11, 2014 at 1:23 PM, Andy Lutomirski l...@amacapital.net wrote:
 On my VM, getpid takes about 70ns.  Before this patch, adding a
 single-instruction always-accept seccomp filter added about 134ns of
 overhead to getpid.  With this patch, the overhead is down to about
 13ns.

 interesting.
 Is this the gain from patch 4 into patch 5 or from patch 0 to patch 5?
 13ns is still with seccomp enabled, but without filters?

13ns is with the simplest nonempty filter.  I hope that empty filters
don't work.


 I'm not really thrilled by this patch.  It has two main issues:

 1. Calling into code in kernel/seccomp.c from assembly feels ugly.

 2. The x86 64-bit syscall entry now has four separate code paths:
 fast, seccomp only, audit only, and slow.  This kind of sucks.
 Would it be worth trying to rewrite the whole thing in C with a
 two-phase slow path approach like I'm using here for seccomp?

 Signed-off-by: Andy Lutomirski l...@amacapital.net
 ---
  arch/x86/kernel/entry_64.S | 45 
 +
  include/linux/seccomp.h|  4 ++--
  2 files changed, 47 insertions(+), 2 deletions(-)

 diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
 index f9e713a..feb32b2 100644
 --- a/arch/x86/kernel/entry_64.S
 +++ b/arch/x86/kernel/entry_64.S
 @@ -683,6 +683,45 @@ sysret_signal:
 FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
 jmp int_check_syscall_exit_work

 +#ifdef CONFIG_SECCOMP
 +   /*
 +* Fast path for seccomp without any other slow path triggers.
 +*/
 +seccomp_fastpath:
 +   /* Build seccomp_data */
 +   pushq %r9   /* args[5] */
 +   pushq %r8   /* args[4] */
 +   pushq %r10  /* args[3] */
 +   pushq %rdx  /* args[2] */
 +   pushq %rsi  /* args[1] */
 +   pushq %rdi  /* args[0] */
 +   pushq RIP-ARGOFFSET+6*8(%rsp)   /* rip */

 +   pushq %rax  /* nr and junk */
 +   movl $AUDIT_ARCH_X86_64, 4(%rsp)/* arch */

It wouldn't shock me if this pair of instructions were
microarchitecturally bad.  Maybe I can save a few more cycles by using
bitwise arithmetic or a pair of movls and explicit rsp manipulation
here.  I haven't tried.

 +   movq %rsp, %rdi
 +   call seccomp_phase1

 the assembler code is pretty much repeating what C does in
 populate_seccomp_data(). Assuming the whole gain came from
 patch 5 why asm version is so much faster than C?

 it skips SAVE/RESTORE_REST... what else?
 If the most of the gain is from all patches combined (mainly from
 2 phase approach) then why bother with asm?

The whole gain should be patch 5, but there are three things going on here.

The biggest win is skipping int_ret_from_sys_call.  IRET sucks.
There's some extra win from skipping SAVE/RESTORE_REST, but I haven't
benchmarked that.  I would guess it's on the order of 5ns.  In theory
a one-pass implementation could skip int_ret_from_sys_call, but that
will be awkward to implement correctly.

The other benefit is generating seccomp_data in assembly.  It saves
about 7ns.  This is likely due to avoiding all the indirection in
syscall_xyz and to avoiding prodding at flags to figure out the arch
token.

Fiddling with branch prediction made no difference that I could measure.


 Somehow it feels that the gain is due to better branch prediction
 in asm version. If you change few 'unlikely' in C to 'likely', it may
 get to the same performance...

 btw patches #1-3 look good to me. especially #1 is nice.

Thanks :)

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread H. Peter Anvin
On 06/11/2014 02:56 PM, Andy Lutomirski wrote:
 
 13ns is with the simplest nonempty filter.  I hope that empty filters
 don't work.
 

Why wouldn't they?

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread Andy Lutomirski
On Wed, Jun 11, 2014 at 3:18 PM, H. Peter Anvin h...@zytor.com wrote:
 On 06/11/2014 02:56 PM, Andy Lutomirski wrote:

 13ns is with the simplest nonempty filter.  I hope that empty filters
 don't work.


 Why wouldn't they?

Is it permissible to fall off the end of a BPF program?  I'm getting
EINVAL trying to install an actual empty filter.  The filter I tested
with was:

#include unistd.h
#include linux/filter.h
#include linux/seccomp.h
#include sys/syscall.h
#include err.h
#include sys/prctl.h
#include stddef.h
#include stdio.h

int main(int argc, char **argv)
{
int rc;

struct sock_filter filter[] = {
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
};

struct sock_fprog prog = {
.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
.filter = filter,
};

if (argc  2) {
printf(Usage: null_seccomp PATH ARGS...\n);
return 1;
}

if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0))
err(1, PR_SET_NO_NEW_PRIVS);
if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog))
err(1, PR_SET_SECCOMP);

execv(argv[1], argv + 1);
err(1, argv[1]);
}


--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread H. Peter Anvin
On 06/11/2014 03:22 PM, Andy Lutomirski wrote:
 On Wed, Jun 11, 2014 at 3:18 PM, H. Peter Anvin h...@zytor.com wrote:
 On 06/11/2014 02:56 PM, Andy Lutomirski wrote:

 13ns is with the simplest nonempty filter.  I hope that empty filters
 don't work.


 Why wouldn't they?
 
 Is it permissible to fall off the end of a BPF program?  I'm getting
 EINVAL trying to install an actual empty filter.  The filter I tested
 with was:
 

What I meant was that there has to be a well-defined behavior for the
program falling off the end anyway, and that that should be preserved.

I guess it is possible to require that all code paths must provably
reach a termination point.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread Andy Lutomirski
On Wed, Jun 11, 2014 at 3:27 PM, H. Peter Anvin h...@zytor.com wrote:
 On 06/11/2014 03:22 PM, Andy Lutomirski wrote:
 On Wed, Jun 11, 2014 at 3:18 PM, H. Peter Anvin h...@zytor.com wrote:
 On 06/11/2014 02:56 PM, Andy Lutomirski wrote:

 13ns is with the simplest nonempty filter.  I hope that empty filters
 don't work.


 Why wouldn't they?

 Is it permissible to fall off the end of a BPF program?  I'm getting
 EINVAL trying to install an actual empty filter.  The filter I tested
 with was:


 What I meant was that there has to be a well-defined behavior for the
 program falling off the end anyway, and that that should be preserved.

 I guess it is possible to require that all code paths must provably
 reach a termination point.


Dunno.  I haven't ever touched any of the actual BPF code.  This whole
patchset only changes the code that invokes the BPF evaluator.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/5] x86,seccomp: Add a seccomp fastpath

2014-06-11 Thread Kees Cook
On Wed, Jun 11, 2014 at 3:28 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Wed, Jun 11, 2014 at 3:27 PM, H. Peter Anvin h...@zytor.com wrote:
 On 06/11/2014 03:22 PM, Andy Lutomirski wrote:
 On Wed, Jun 11, 2014 at 3:18 PM, H. Peter Anvin h...@zytor.com wrote:
 On 06/11/2014 02:56 PM, Andy Lutomirski wrote:

 13ns is with the simplest nonempty filter.  I hope that empty filters
 don't work.


 Why wouldn't they?

 Is it permissible to fall off the end of a BPF program?  I'm getting
 EINVAL trying to install an actual empty filter.  The filter I tested
 with was:


 What I meant was that there has to be a well-defined behavior for the
 program falling off the end anyway, and that that should be preserved.

 I guess it is possible to require that all code paths must provably
 reach a termination point.


 Dunno.  I haven't ever touched any of the actual BPF code.  This whole
 patchset only changes the code that invokes the BPF evaluator.

Yes, this is how BPF works: runs to the end or exit early. With
seccomp BPF specifically, the return value defaults to kill the
process. If a filter was missing (NULL), or empty, or didn't
explicitly return with a new value, the default (kill) should be
taken.

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/