Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread H. Peter Anvin
On 05/28/2014 02:53 PM, Philipp Kern wrote:
> On Wed, May 28, 2014 at 11:43 PM, Andy Lutomirski  wrote:
>> However: are you sure that entry_64.S handles this?  It looks like
>> tracesys has higher priority than badsys.  And strace can certainly
>> see out-of-range syscalls. […]
> 
> Not only can it see them: It must see that this bit is set as that's
> the only identifier it has to deduce that the binary is running in x32
> mode.

Yes... keep in mind the ABI is a local property: just because the binary
is running in x32 mode doesn't mean it can't access x86-64 or i386
system calls (or similar for x86-64 processes.)  A process started in
i386 mode can transition to long mode and execute x86-64 or x32 system
calls, too.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Andy Lutomirski
On Wed, May 28, 2014 at 2:53 PM, Philipp Kern  wrote:
> On Wed, May 28, 2014 at 11:43 PM, Andy Lutomirski  wrote:
>> However: are you sure that entry_64.S handles this?  It looks like
>> tracesys has higher priority than badsys.  And strace can certainly
>> see out-of-range syscalls. […]
>
> Not only can it see them: It must see that this bit is set as that's
> the only identifier it has to deduce that the binary is running in x32
> mode.
>
> Out of range syscall numbers certainly do not work for auditing right
> now, hence my attempt to patch around it.

There appears to be a completely arbitrary limit of 32*64 syscalls.
There's also an arbitrary limit of 4 arguments.  Both are wrong.  I
have no intention of fixing either.

I'll fix the OOPS, though.

>
> Kind regards and thanks
> Philipp Kern



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Philipp Kern
On Wed, May 28, 2014 at 11:43 PM, Andy Lutomirski  wrote:
> However: are you sure that entry_64.S handles this?  It looks like
> tracesys has higher priority than badsys.  And strace can certainly
> see out-of-range syscalls. […]

Not only can it see them: It must see that this bit is set as that's
the only identifier it has to deduce that the binary is running in x32
mode.

Out of range syscall numbers certainly do not work for auditing right
now, hence my attempt to patch around it.

Kind regards and thanks
Philipp Kern
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Andy Lutomirski
On Wed, May 28, 2014 at 2:19 PM, H. Peter Anvin  wrote:
> On 05/28/2014 02:15 PM, Andy Lutomirski wrote:
>>>
 3. The OOPS you're fixing doesn't seem like it's fixed.  What if some
 other random high bits are set?
>>>
>>> There is a range check in entry_*.S for the system call.
>>
>> I can imagine that causing a certain amount of confusion to fancy
>> seccomp users.  Oh, well.  No one that I know of has complained yet.
>>
>
> I don't know how seccomp or audit deal with out-of-range syscall
> numbers, so that might be worth taking a look at.

audit oopses, apparently.  seccomp will tell BPF about it and follow
directions, barring bugs.

However: are you sure that entry_64.S handles this?  It looks like
tracesys has higher priority than badsys.  And strace can certainly
see out-of-range syscalls.  And I can oops audit by doing:

auditctl -a exit,always -S open
./a.out

where a.out is this:

#include 
#include 

int main()
{
  long i;
  for (i = 1000; ; i += 64 * 4096)
syscall(i, 0, 0, 0, 0, 0, 0);
  return 0;
}

I would *love* to deprecate the entire syscall auditing mechanism.  Or
at least I'd love to have distros stop using it.

I'll go ask for a CVE number.  Sigh.


--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread H. Peter Anvin
On 05/28/2014 02:15 PM, Andy Lutomirski wrote:
>>
>>> 3. The OOPS you're fixing doesn't seem like it's fixed.  What if some
>>> other random high bits are set?
>>
>> There is a range check in entry_*.S for the system call.
> 
> I can imagine that causing a certain amount of confusion to fancy
> seccomp users.  Oh, well.  No one that I know of has complained yet.
> 

I don't know how seccomp or audit deal with out-of-range syscall
numbers, so that might be worth taking a look at.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Andy Lutomirski
On Wed, May 28, 2014 at 2:01 PM, H. Peter Anvin  wrote:
> On 05/28/2014 01:47 PM, Andy Lutomirski wrote:
>> On 05/28/2014 05:19 AM, Philipp Kern wrote:
>>> audit_filter_syscall uses the syscall number to reference into a
>>> bitmask (e->rule.mask[word]). Not removing the x32 bit before passing
>>> the number to this architecture independent codepath will fail to
>>> lookup the proper audit bit. Furthermore it will cause an invalid memory
>>> access in the kernel if the out of bound location is not mapped:
>>>
>>>   BUG: unable to handle kernel paging request at 8800e5446630
>>>   IP: [] audit_filter_syscall+0x90/0xf0
>>>
>>> Together with the entrypoint in entry_64.S this change causes x32
>>> programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending
>>> on the syscall path.
>>>
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: H. J. Lu 
>>> Cc: Eric Paris 
>>> Signed-off-by: Philipp Kern 
>>> ---
>>>  arch/x86/kernel/ptrace.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
>>> index 678c0ad..166a3c7 100644
>>> --- a/arch/x86/kernel/ptrace.c
>>> +++ b/arch/x86/kernel/ptrace.c
>>> @@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs)
>>>
>>>  if (IS_IA32)
>>>  audit_syscall_entry(AUDIT_ARCH_I386,
>>> -regs->orig_ax,
>>> +regs->orig_ax & __SYSCALL_MASK,
>>
>> This is weird.  Three questions:
>>
>> 1. How can this happen?  I thought that x32 syscalls always came in
>> through the syscall path, which doesn't set is_compat_task.  (Can
>> someone rename is_compat_task to in_compat_syscall?  Pretty please?)
>
> The SYSCALL path doesn't set TS_COMPAT, but is_compat_task() looks both
> as TS_COMPAT and bit 30 of orig_ax.
>
> I think what is really needed here is IS_IA32 should use is_ia32_task()
> instead, and *that* is the context we can mask off the x32 bit in at
> all.  However, does audit not need that information?

This is thoroughly fscked up.

What *should* have happened is that there should have been an
AUDIT_ARCH_X32.  Unfortunately no one caught that in time, so the
current enshrined ABI is that, as far as seccomp is concerned, x32 is
AUDIT_ARCH_X86_64 (see syscall_get_arch) but nr has the x32 bit set.

But the audit code is different: x32 is AUDIT_ARCH_I386 and the x32
bit is set.  This may be changeable, since fixes to the audit ABI may
not break anything.  Can we just use syscall_get_arch to determine the
token to pass to the audit code?

If it makes everyone feel better, I think that every single
architecture has screwed this up.  We actually had to prevent seccomp
and audit from being configured in on any OABI-supporting ARM kernel,
and MIPS almost did the same thing that x32 did.  We caught that one
on time, and it's now fixed in Linus' tree.

>
> (And why the frakk does audit receive the first four syscall arguments?
>  Audit seems like the worst turd ever...)

If you ever benchmark the performance impact of merely running auditd,
you might have to upgrade that insult :-/

>
>> 2. Now audit can't tell whether a syscall is x32 or i386.  And audit is
>> inconsistent with seccomp.  This seems wrong.
>
> This is completely and totally bogus, indeed.

I'd suggest just fixing the bug in auditsc.c.

>
>> 3. The OOPS you're fixing doesn't seem like it's fixed.  What if some
>> other random high bits are set?
>
> There is a range check in entry_*.S for the system call.

I can imagine that causing a certain amount of confusion to fancy
seccomp users.  Oh, well.  No one that I know of has complained yet.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread H. Peter Anvin
On 05/28/2014 01:47 PM, Andy Lutomirski wrote:
> On 05/28/2014 05:19 AM, Philipp Kern wrote:
>> audit_filter_syscall uses the syscall number to reference into a
>> bitmask (e->rule.mask[word]). Not removing the x32 bit before passing
>> the number to this architecture independent codepath will fail to
>> lookup the proper audit bit. Furthermore it will cause an invalid memory
>> access in the kernel if the out of bound location is not mapped:
>>
>>   BUG: unable to handle kernel paging request at 8800e5446630
>>   IP: [] audit_filter_syscall+0x90/0xf0
>>
>> Together with the entrypoint in entry_64.S this change causes x32
>> programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending
>> on the syscall path.
>>
>> Cc: linux-kernel@vger.kernel.org
>> Cc: H. J. Lu 
>> Cc: Eric Paris 
>> Signed-off-by: Philipp Kern 
>> ---
>>  arch/x86/kernel/ptrace.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
>> index 678c0ad..166a3c7 100644
>> --- a/arch/x86/kernel/ptrace.c
>> +++ b/arch/x86/kernel/ptrace.c
>> @@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs)
>>  
>>  if (IS_IA32)
>>  audit_syscall_entry(AUDIT_ARCH_I386,
>> -regs->orig_ax,
>> +regs->orig_ax & __SYSCALL_MASK,
> 
> This is weird.  Three questions:
> 
> 1. How can this happen?  I thought that x32 syscalls always came in
> through the syscall path, which doesn't set is_compat_task.  (Can
> someone rename is_compat_task to in_compat_syscall?  Pretty please?)

The SYSCALL path doesn't set TS_COMPAT, but is_compat_task() looks both
as TS_COMPAT and bit 30 of orig_ax.

I think what is really needed here is IS_IA32 should use is_ia32_task()
instead, and *that* is the context we can mask off the x32 bit in at
all.  However, does audit not need that information?

(And why the frakk does audit receive the first four syscall arguments?
 Audit seems like the worst turd ever...)

> 2. Now audit can't tell whether a syscall is x32 or i386.  And audit is
> inconsistent with seccomp.  This seems wrong.

This is completely and totally bogus, indeed.

> 3. The OOPS you're fixing doesn't seem like it's fixed.  What if some
> other random high bits are set?

There is a range check in entry_*.S for the system call.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Philipp Kern
On Wed, May 28, 2014 at 10:47 PM, Andy Lutomirski  wrote:
> On 05/28/2014 05:19 AM, Philipp Kern wrote:
> > audit_filter_syscall uses the syscall number to reference into a
> > bitmask (e->rule.mask[word]). Not removing the x32 bit before passing
> > the number to this architecture independent codepath will fail to
> > lookup the proper audit bit. Furthermore it will cause an invalid memory
> > access in the kernel if the out of bound location is not mapped:
> >
> >   BUG: unable to handle kernel paging request at 8800e5446630
> >   IP: [] audit_filter_syscall+0x90/0xf0
> >
> > Together with the entrypoint in entry_64.S this change causes x32
> > programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending
> > on the syscall path.
> >
> > Cc: linux-kernel@vger.kernel.org
> > Cc: H. J. Lu 
> > Cc: Eric Paris 
> > Signed-off-by: Philipp Kern 
> > ---
> >  arch/x86/kernel/ptrace.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
> > index 678c0ad..166a3c7 100644
> > --- a/arch/x86/kernel/ptrace.c
> > +++ b/arch/x86/kernel/ptrace.c
> > @@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs)
> >
> >   if (IS_IA32)
> >   audit_syscall_entry(AUDIT_ARCH_I386,
> > - regs->orig_ax,
> > + regs->orig_ax & __SYSCALL_MASK,
>
> This is weird.  Three questions:
>
> 1. How can this happen?  I thought that x32 syscalls always came in
> through the syscall path, which doesn't set is_compat_task.  (Can
> someone rename is_compat_task to in_compat_syscall?  Pretty please?)

The other patch is this one: https://lkml.org/lkml/2014/5/26/209

But I agree: IS_IA32 is confusing but it does trigger on x32 tasks. I
put that into a bug report but apparently not into the patch
description, I'm sorry. is_compat_task is defined as is_ia32_task() ||
is_x32_task().

> 2. Now audit can't tell whether a syscall is x32 or i386.  And audit is
> inconsistent with seccomp.  This seems wrong.

I'm not sure where seccomp is hooked in. Can you point me to the entry
points here? ptrace still requires that userspace sees the x32 bit so
it cannot be masked beforehand. The path the other patch fixes masks
the bit away directly after the audit call.

> 3. The OOPS you're fixing doesn't seem like it's fixed.  What if some
> other random high bits are set?

Fair point. I guess we should check against AUDIT_BITMASK_SIZE as that
is the size of the array we are referencing into. But what would we do
in case it's larger? Those are not ENOSYS'ed yet in the entry_64.S
path. That's after tracing and auditing.

Kind regards and thanks
Philipp Kern
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Andy Lutomirski
On 05/28/2014 05:19 AM, Philipp Kern wrote:
> audit_filter_syscall uses the syscall number to reference into a
> bitmask (e->rule.mask[word]). Not removing the x32 bit before passing
> the number to this architecture independent codepath will fail to
> lookup the proper audit bit. Furthermore it will cause an invalid memory
> access in the kernel if the out of bound location is not mapped:
> 
>   BUG: unable to handle kernel paging request at 8800e5446630
>   IP: [] audit_filter_syscall+0x90/0xf0
> 
> Together with the entrypoint in entry_64.S this change causes x32
> programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending
> on the syscall path.
> 
> Cc: linux-kernel@vger.kernel.org
> Cc: H. J. Lu 
> Cc: Eric Paris 
> Signed-off-by: Philipp Kern 
> ---
>  arch/x86/kernel/ptrace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
> index 678c0ad..166a3c7 100644
> --- a/arch/x86/kernel/ptrace.c
> +++ b/arch/x86/kernel/ptrace.c
> @@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs)
>  
>   if (IS_IA32)
>   audit_syscall_entry(AUDIT_ARCH_I386,
> - regs->orig_ax,
> + regs->orig_ax & __SYSCALL_MASK,

This is weird.  Three questions:

1. How can this happen?  I thought that x32 syscalls always came in
through the syscall path, which doesn't set is_compat_task.  (Can
someone rename is_compat_task to in_compat_syscall?  Pretty please?)

2. Now audit can't tell whether a syscall is x32 or i386.  And audit is
inconsistent with seccomp.  This seems wrong.

3. The OOPS you're fixing doesn't seem like it's fixed.  What if some
other random high bits are set?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Philipp Kern
audit_filter_syscall uses the syscall number to reference into a
bitmask (e->rule.mask[word]). Not removing the x32 bit before passing
the number to this architecture independent codepath will fail to
lookup the proper audit bit. Furthermore it will cause an invalid memory
access in the kernel if the out of bound location is not mapped:

  BUG: unable to handle kernel paging request at 8800e5446630
  IP: [] audit_filter_syscall+0x90/0xf0

Together with the entrypoint in entry_64.S this change causes x32
programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending
on the syscall path.

Cc: linux-kernel@vger.kernel.org
Cc: H. J. Lu 
Cc: Eric Paris 
Signed-off-by: Philipp Kern 
---
 arch/x86/kernel/ptrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 678c0ad..166a3c7 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs)
 
if (IS_IA32)
audit_syscall_entry(AUDIT_ARCH_I386,
-   regs->orig_ax,
+   regs->orig_ax & __SYSCALL_MASK,
regs->bx, regs->cx,
regs->dx, regs->si);
 #ifdef CONFIG_X86_64
-- 
1.9.1.423.g4596e3a

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Andy Lutomirski
On 05/28/2014 05:19 AM, Philipp Kern wrote:
 audit_filter_syscall uses the syscall number to reference into a
 bitmask (e-rule.mask[word]). Not removing the x32 bit before passing
 the number to this architecture independent codepath will fail to
 lookup the proper audit bit. Furthermore it will cause an invalid memory
 access in the kernel if the out of bound location is not mapped:
 
   BUG: unable to handle kernel paging request at 8800e5446630
   IP: [810fcdd0] audit_filter_syscall+0x90/0xf0
 
 Together with the entrypoint in entry_64.S this change causes x32
 programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending
 on the syscall path.
 
 Cc: linux-kernel@vger.kernel.org
 Cc: H. J. Lu hjl.to...@gmail.com
 Cc: Eric Paris epa...@redhat.com
 Signed-off-by: Philipp Kern pk...@google.com
 ---
  arch/x86/kernel/ptrace.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
 index 678c0ad..166a3c7 100644
 --- a/arch/x86/kernel/ptrace.c
 +++ b/arch/x86/kernel/ptrace.c
 @@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs)
  
   if (IS_IA32)
   audit_syscall_entry(AUDIT_ARCH_I386,
 - regs-orig_ax,
 + regs-orig_ax  __SYSCALL_MASK,

This is weird.  Three questions:

1. How can this happen?  I thought that x32 syscalls always came in
through the syscall path, which doesn't set is_compat_task.  (Can
someone rename is_compat_task to in_compat_syscall?  Pretty please?)

2. Now audit can't tell whether a syscall is x32 or i386.  And audit is
inconsistent with seccomp.  This seems wrong.

3. The OOPS you're fixing doesn't seem like it's fixed.  What if some
other random high bits are set?

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Philipp Kern
On Wed, May 28, 2014 at 10:47 PM, Andy Lutomirski l...@amacapital.net wrote:
 On 05/28/2014 05:19 AM, Philipp Kern wrote:
  audit_filter_syscall uses the syscall number to reference into a
  bitmask (e-rule.mask[word]). Not removing the x32 bit before passing
  the number to this architecture independent codepath will fail to
  lookup the proper audit bit. Furthermore it will cause an invalid memory
  access in the kernel if the out of bound location is not mapped:
 
BUG: unable to handle kernel paging request at 8800e5446630
IP: [810fcdd0] audit_filter_syscall+0x90/0xf0
 
  Together with the entrypoint in entry_64.S this change causes x32
  programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending
  on the syscall path.
 
  Cc: linux-kernel@vger.kernel.org
  Cc: H. J. Lu hjl.to...@gmail.com
  Cc: Eric Paris epa...@redhat.com
  Signed-off-by: Philipp Kern pk...@google.com
  ---
   arch/x86/kernel/ptrace.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
  index 678c0ad..166a3c7 100644
  --- a/arch/x86/kernel/ptrace.c
  +++ b/arch/x86/kernel/ptrace.c
  @@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs)
 
if (IS_IA32)
audit_syscall_entry(AUDIT_ARCH_I386,
  - regs-orig_ax,
  + regs-orig_ax  __SYSCALL_MASK,

 This is weird.  Three questions:

 1. How can this happen?  I thought that x32 syscalls always came in
 through the syscall path, which doesn't set is_compat_task.  (Can
 someone rename is_compat_task to in_compat_syscall?  Pretty please?)

The other patch is this one: https://lkml.org/lkml/2014/5/26/209

But I agree: IS_IA32 is confusing but it does trigger on x32 tasks. I
put that into a bug report but apparently not into the patch
description, I'm sorry. is_compat_task is defined as is_ia32_task() ||
is_x32_task().

 2. Now audit can't tell whether a syscall is x32 or i386.  And audit is
 inconsistent with seccomp.  This seems wrong.

I'm not sure where seccomp is hooked in. Can you point me to the entry
points here? ptrace still requires that userspace sees the x32 bit so
it cannot be masked beforehand. The path the other patch fixes masks
the bit away directly after the audit call.

 3. The OOPS you're fixing doesn't seem like it's fixed.  What if some
 other random high bits are set?

Fair point. I guess we should check against AUDIT_BITMASK_SIZE as that
is the size of the array we are referencing into. But what would we do
in case it's larger? Those are not ENOSYS'ed yet in the entry_64.S
path. That's after tracing and auditing.

Kind regards and thanks
Philipp Kern
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread H. Peter Anvin
On 05/28/2014 01:47 PM, Andy Lutomirski wrote:
 On 05/28/2014 05:19 AM, Philipp Kern wrote:
 audit_filter_syscall uses the syscall number to reference into a
 bitmask (e-rule.mask[word]). Not removing the x32 bit before passing
 the number to this architecture independent codepath will fail to
 lookup the proper audit bit. Furthermore it will cause an invalid memory
 access in the kernel if the out of bound location is not mapped:

   BUG: unable to handle kernel paging request at 8800e5446630
   IP: [810fcdd0] audit_filter_syscall+0x90/0xf0

 Together with the entrypoint in entry_64.S this change causes x32
 programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending
 on the syscall path.

 Cc: linux-kernel@vger.kernel.org
 Cc: H. J. Lu hjl.to...@gmail.com
 Cc: Eric Paris epa...@redhat.com
 Signed-off-by: Philipp Kern pk...@google.com
 ---
  arch/x86/kernel/ptrace.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
 index 678c0ad..166a3c7 100644
 --- a/arch/x86/kernel/ptrace.c
 +++ b/arch/x86/kernel/ptrace.c
 @@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs)
  
  if (IS_IA32)
  audit_syscall_entry(AUDIT_ARCH_I386,
 -regs-orig_ax,
 +regs-orig_ax  __SYSCALL_MASK,
 
 This is weird.  Three questions:
 
 1. How can this happen?  I thought that x32 syscalls always came in
 through the syscall path, which doesn't set is_compat_task.  (Can
 someone rename is_compat_task to in_compat_syscall?  Pretty please?)

The SYSCALL path doesn't set TS_COMPAT, but is_compat_task() looks both
as TS_COMPAT and bit 30 of orig_ax.

I think what is really needed here is IS_IA32 should use is_ia32_task()
instead, and *that* is the context we can mask off the x32 bit in at
all.  However, does audit not need that information?

(And why the frakk does audit receive the first four syscall arguments?
 Audit seems like the worst turd ever...)

 2. Now audit can't tell whether a syscall is x32 or i386.  And audit is
 inconsistent with seccomp.  This seems wrong.

This is completely and totally bogus, indeed.

 3. The OOPS you're fixing doesn't seem like it's fixed.  What if some
 other random high bits are set?

There is a range check in entry_*.S for the system call.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Andy Lutomirski
On Wed, May 28, 2014 at 2:01 PM, H. Peter Anvin h...@linux.intel.com wrote:
 On 05/28/2014 01:47 PM, Andy Lutomirski wrote:
 On 05/28/2014 05:19 AM, Philipp Kern wrote:
 audit_filter_syscall uses the syscall number to reference into a
 bitmask (e-rule.mask[word]). Not removing the x32 bit before passing
 the number to this architecture independent codepath will fail to
 lookup the proper audit bit. Furthermore it will cause an invalid memory
 access in the kernel if the out of bound location is not mapped:

   BUG: unable to handle kernel paging request at 8800e5446630
   IP: [810fcdd0] audit_filter_syscall+0x90/0xf0

 Together with the entrypoint in entry_64.S this change causes x32
 programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending
 on the syscall path.

 Cc: linux-kernel@vger.kernel.org
 Cc: H. J. Lu hjl.to...@gmail.com
 Cc: Eric Paris epa...@redhat.com
 Signed-off-by: Philipp Kern pk...@google.com
 ---
  arch/x86/kernel/ptrace.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
 index 678c0ad..166a3c7 100644
 --- a/arch/x86/kernel/ptrace.c
 +++ b/arch/x86/kernel/ptrace.c
 @@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs)

  if (IS_IA32)
  audit_syscall_entry(AUDIT_ARCH_I386,
 -regs-orig_ax,
 +regs-orig_ax  __SYSCALL_MASK,

 This is weird.  Three questions:

 1. How can this happen?  I thought that x32 syscalls always came in
 through the syscall path, which doesn't set is_compat_task.  (Can
 someone rename is_compat_task to in_compat_syscall?  Pretty please?)

 The SYSCALL path doesn't set TS_COMPAT, but is_compat_task() looks both
 as TS_COMPAT and bit 30 of orig_ax.

 I think what is really needed here is IS_IA32 should use is_ia32_task()
 instead, and *that* is the context we can mask off the x32 bit in at
 all.  However, does audit not need that information?

This is thoroughly fscked up.

What *should* have happened is that there should have been an
AUDIT_ARCH_X32.  Unfortunately no one caught that in time, so the
current enshrined ABI is that, as far as seccomp is concerned, x32 is
AUDIT_ARCH_X86_64 (see syscall_get_arch) but nr has the x32 bit set.

But the audit code is different: x32 is AUDIT_ARCH_I386 and the x32
bit is set.  This may be changeable, since fixes to the audit ABI may
not break anything.  Can we just use syscall_get_arch to determine the
token to pass to the audit code?

If it makes everyone feel better, I think that every single
architecture has screwed this up.  We actually had to prevent seccomp
and audit from being configured in on any OABI-supporting ARM kernel,
and MIPS almost did the same thing that x32 did.  We caught that one
on time, and it's now fixed in Linus' tree.


 (And why the frakk does audit receive the first four syscall arguments?
  Audit seems like the worst turd ever...)

If you ever benchmark the performance impact of merely running auditd,
you might have to upgrade that insult :-/


 2. Now audit can't tell whether a syscall is x32 or i386.  And audit is
 inconsistent with seccomp.  This seems wrong.

 This is completely and totally bogus, indeed.

I'd suggest just fixing the bug in auditsc.c.


 3. The OOPS you're fixing doesn't seem like it's fixed.  What if some
 other random high bits are set?

 There is a range check in entry_*.S for the system call.

I can imagine that causing a certain amount of confusion to fancy
seccomp users.  Oh, well.  No one that I know of has complained yet.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread H. Peter Anvin
On 05/28/2014 02:15 PM, Andy Lutomirski wrote:

 3. The OOPS you're fixing doesn't seem like it's fixed.  What if some
 other random high bits are set?

 There is a range check in entry_*.S for the system call.
 
 I can imagine that causing a certain amount of confusion to fancy
 seccomp users.  Oh, well.  No one that I know of has complained yet.
 

I don't know how seccomp or audit deal with out-of-range syscall
numbers, so that might be worth taking a look at.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Andy Lutomirski
On Wed, May 28, 2014 at 2:19 PM, H. Peter Anvin h...@linux.intel.com wrote:
 On 05/28/2014 02:15 PM, Andy Lutomirski wrote:

 3. The OOPS you're fixing doesn't seem like it's fixed.  What if some
 other random high bits are set?

 There is a range check in entry_*.S for the system call.

 I can imagine that causing a certain amount of confusion to fancy
 seccomp users.  Oh, well.  No one that I know of has complained yet.


 I don't know how seccomp or audit deal with out-of-range syscall
 numbers, so that might be worth taking a look at.

audit oopses, apparently.  seccomp will tell BPF about it and follow
directions, barring bugs.

However: are you sure that entry_64.S handles this?  It looks like
tracesys has higher priority than badsys.  And strace can certainly
see out-of-range syscalls.  And I can oops audit by doing:

auditctl -a exit,always -S open
./a.out

where a.out is this:

#include stdio.h
#include sys/syscall.h

int main()
{
  long i;
  for (i = 1000; ; i += 64 * 4096)
syscall(i, 0, 0, 0, 0, 0, 0);
  return 0;
}

I would *love* to deprecate the entire syscall auditing mechanism.  Or
at least I'd love to have distros stop using it.

I'll go ask for a CVE number.  Sigh.


--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Philipp Kern
On Wed, May 28, 2014 at 11:43 PM, Andy Lutomirski l...@amacapital.net wrote:
 However: are you sure that entry_64.S handles this?  It looks like
 tracesys has higher priority than badsys.  And strace can certainly
 see out-of-range syscalls. […]

Not only can it see them: It must see that this bit is set as that's
the only identifier it has to deduce that the binary is running in x32
mode.

Out of range syscall numbers certainly do not work for auditing right
now, hence my attempt to patch around it.

Kind regards and thanks
Philipp Kern
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Andy Lutomirski
On Wed, May 28, 2014 at 2:53 PM, Philipp Kern pk...@google.com wrote:
 On Wed, May 28, 2014 at 11:43 PM, Andy Lutomirski l...@amacapital.net wrote:
 However: are you sure that entry_64.S handles this?  It looks like
 tracesys has higher priority than badsys.  And strace can certainly
 see out-of-range syscalls. […]

 Not only can it see them: It must see that this bit is set as that's
 the only identifier it has to deduce that the binary is running in x32
 mode.

 Out of range syscall numbers certainly do not work for auditing right
 now, hence my attempt to patch around it.

There appears to be a completely arbitrary limit of 32*64 syscalls.
There's also an arbitrary limit of 4 arguments.  Both are wrong.  I
have no intention of fixing either.

I'll fix the OOPS, though.


 Kind regards and thanks
 Philipp Kern



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread H. Peter Anvin
On 05/28/2014 02:53 PM, Philipp Kern wrote:
 On Wed, May 28, 2014 at 11:43 PM, Andy Lutomirski l...@amacapital.net wrote:
 However: are you sure that entry_64.S handles this?  It looks like
 tracesys has higher priority than badsys.  And strace can certainly
 see out-of-range syscalls. […]
 
 Not only can it see them: It must see that this bit is set as that's
 the only identifier it has to deduce that the binary is running in x32
 mode.

Yes... keep in mind the ABI is a local property: just because the binary
is running in x32 mode doesn't mean it can't access x86-64 or i386
system calls (or similar for x86-64 processes.)  A process started in
i386 mode can transition to long mode and execute x86-64 or x32 system
calls, too.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath

2014-05-28 Thread Philipp Kern
audit_filter_syscall uses the syscall number to reference into a
bitmask (e-rule.mask[word]). Not removing the x32 bit before passing
the number to this architecture independent codepath will fail to
lookup the proper audit bit. Furthermore it will cause an invalid memory
access in the kernel if the out of bound location is not mapped:

  BUG: unable to handle kernel paging request at 8800e5446630
  IP: [810fcdd0] audit_filter_syscall+0x90/0xf0

Together with the entrypoint in entry_64.S this change causes x32
programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending
on the syscall path.

Cc: linux-kernel@vger.kernel.org
Cc: H. J. Lu hjl.to...@gmail.com
Cc: Eric Paris epa...@redhat.com
Signed-off-by: Philipp Kern pk...@google.com
---
 arch/x86/kernel/ptrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 678c0ad..166a3c7 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs)
 
if (IS_IA32)
audit_syscall_entry(AUDIT_ARCH_I386,
-   regs-orig_ax,
+   regs-orig_ax  __SYSCALL_MASK,
regs-bx, regs-cx,
regs-dx, regs-si);
 #ifdef CONFIG_X86_64
-- 
1.9.1.423.g4596e3a

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/