Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-25 Thread Tycho Andersen
On Fri, Sep 23, 2016 at 08:34:43PM +0200, Jann Horn wrote:
> On Fri, Sep 23, 2016 at 11:28:26AM -0700, Kees Cook wrote:
> > Does CRIU use this? I wouldn't expect so, since they're using ptrace,
> > IIUC, to freeze/restore.
> 
> As far as I can tell:
> 
> parse_pid_stat() parses them into a struct proc_pid_stat as "esp" and "eip",
> but those struct members are never used (like, probably, most other members
> of that struct).

Yes, that's my reading of it too.

> child_opened_proc.c just opens /proc/%d/stat and then closes it again
> immediately.

This is just a test for ordering of things that are restored, and it
could use any file in /proc, stat was just convenient.

> So in summary: I don't think so.

Yep, agreed.

Tycho


Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-25 Thread Tycho Andersen
On Fri, Sep 23, 2016 at 08:34:43PM +0200, Jann Horn wrote:
> On Fri, Sep 23, 2016 at 11:28:26AM -0700, Kees Cook wrote:
> > Does CRIU use this? I wouldn't expect so, since they're using ptrace,
> > IIUC, to freeze/restore.
> 
> As far as I can tell:
> 
> parse_pid_stat() parses them into a struct proc_pid_stat as "esp" and "eip",
> but those struct members are never used (like, probably, most other members
> of that struct).

Yes, that's my reading of it too.

> child_opened_proc.c just opens /proc/%d/stat and then closes it again
> immediately.

This is just a test for ordering of things that are restored, and it
could use any file in /proc, stat was just convenient.

> So in summary: I don't think so.

Yep, agreed.

Tycho


Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-23 Thread Jann Horn
On Fri, Sep 23, 2016 at 11:28:26AM -0700, Kees Cook wrote:
> On Fri, Sep 23, 2016 at 12:43 AM, Jann Horn  wrote:
> > On Thu, Sep 22, 2016 at 03:44:37PM -0700, Andy Lutomirski wrote:
> >> On Fri, Sep 16, 2016 at 7:00 PM, Jann Horn  wrote:
> >> > On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote:
> >> >> This will prevent a crash if get_wchan() runs after the task stack
> >> >> is freed.
> >> >
> >> > I think I found some more stuff. Have a look at KSTK_EIP() and 
> >> > KSTK_ESP(), I think
> >> > they read from the saved userspace registers area at the top of the 
> >> > kernel stack?
> >> >
> >> > Used on remote processes in:
> >> >   vma_is_stack_for_task() (via /proc/$pid/maps)
> >>
> >> This isn't used in /proc/$pid/maps -- it's only used in
> >> /proc/$pid/task/$tid/maps.  I wonder if anyone actually cares about it
> >> -- it certainly won't work reliably.
> >>
> >> I could pin the stack in vma_is_stack_for_task, but it seems
> >> potentially better to me to change it to vma_is_stack_for_current()
> >> and remove the offending caller in /proc, replacing it with "return
> >> 0".  Thoughts?
> >
> > I just scrolled through the debian codesearch results for "\[stack\]" -
> > there seem to only be 105 across all of debian's packages, many of them
> > duplicates - and I didn't see any that looked like they used the tid map.
> > So I think this might work.
> >
> > ( https://codesearch.debian.net/search?q=%22%5C%5Bstack%5C%5D%22 )
> >
> >
> >> >   do_task_stat() (/proc/$pid/stat)
> >>
> >> Like this:
> >>
> >> mm = get_task_mm(task);
> >> if (mm) {
> >> vsize = task_vsize(mm);
> >> if (permitted) {
> >> eip = KSTK_EIP(task);
> >> esp = KSTK_ESP(task);
> >> }
> >> }
> >>
> >> Can we just delete this outright?  It seems somewhere between mostly
> >> and entirely useless, and it also seems dangerous.  Until very
> >> recently, on x86_64, this would have been a potential info leak, as
> >> SYSCALL followed closely by a hardware interrupt would cause *kernel*
> >> values to land in task_pt_regs().  I don't even want to think about
> >> what this code does if the task is in vm86 mode.  I wouldn't be at all
> >> surprised if non-x86 architectures have all kinds of interesting
> >> thinks happen if you do this to a task that isn't running normal
> >> non-atomic kernel code at the time.
> >>
> >> I would advocate for unconditionally returning zeros in these two stat 
> >> fields.
> >
> > I'd like that a lot.
> >
> > I guess the two things that might theoretically use it are ptrace users
> > and (very theoretically) sampling profiling stuff or so?
> >
> > In gdb, the only code I can find that reads this is in gdb/linux-nat.c, but
> > it's behind an "#ifdef 0":
> >
> >   #if 0   /* Don't know how architecture-dependent the rest is...
> >  Anyway the signal bitmap info is available from "status".  */
> > if (fscanf (procfile, "%lu ", ) > 0) /* FIXME arch?  */
> >   printf_filtered (_("Kernel stack pointer: 0x%lx\n"), ltmp);
> > if (fscanf (procfile, "%lu ", ) > 0) /* FIXME arch?  */
> >   printf_filtered (_("Kernel instr pointer: 0x%lx\n"), ltmp);
> >   [...]
> >
> > strace and ltrace don't seem to be using it.
> 
> Does CRIU use this? I wouldn't expect so, since they're using ptrace,
> IIUC, to freeze/restore.

As far as I can tell:

parse_pid_stat() parses them into a struct proc_pid_stat as "esp" and "eip",
but those struct members are never used (like, probably, most other members
of that struct).

child_opened_proc.c just opens /proc/%d/stat and then closes it again
immediately.

So in summary: I don't think so.


signature.asc
Description: Digital signature


Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-23 Thread Jann Horn
On Fri, Sep 23, 2016 at 11:28:26AM -0700, Kees Cook wrote:
> On Fri, Sep 23, 2016 at 12:43 AM, Jann Horn  wrote:
> > On Thu, Sep 22, 2016 at 03:44:37PM -0700, Andy Lutomirski wrote:
> >> On Fri, Sep 16, 2016 at 7:00 PM, Jann Horn  wrote:
> >> > On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote:
> >> >> This will prevent a crash if get_wchan() runs after the task stack
> >> >> is freed.
> >> >
> >> > I think I found some more stuff. Have a look at KSTK_EIP() and 
> >> > KSTK_ESP(), I think
> >> > they read from the saved userspace registers area at the top of the 
> >> > kernel stack?
> >> >
> >> > Used on remote processes in:
> >> >   vma_is_stack_for_task() (via /proc/$pid/maps)
> >>
> >> This isn't used in /proc/$pid/maps -- it's only used in
> >> /proc/$pid/task/$tid/maps.  I wonder if anyone actually cares about it
> >> -- it certainly won't work reliably.
> >>
> >> I could pin the stack in vma_is_stack_for_task, but it seems
> >> potentially better to me to change it to vma_is_stack_for_current()
> >> and remove the offending caller in /proc, replacing it with "return
> >> 0".  Thoughts?
> >
> > I just scrolled through the debian codesearch results for "\[stack\]" -
> > there seem to only be 105 across all of debian's packages, many of them
> > duplicates - and I didn't see any that looked like they used the tid map.
> > So I think this might work.
> >
> > ( https://codesearch.debian.net/search?q=%22%5C%5Bstack%5C%5D%22 )
> >
> >
> >> >   do_task_stat() (/proc/$pid/stat)
> >>
> >> Like this:
> >>
> >> mm = get_task_mm(task);
> >> if (mm) {
> >> vsize = task_vsize(mm);
> >> if (permitted) {
> >> eip = KSTK_EIP(task);
> >> esp = KSTK_ESP(task);
> >> }
> >> }
> >>
> >> Can we just delete this outright?  It seems somewhere between mostly
> >> and entirely useless, and it also seems dangerous.  Until very
> >> recently, on x86_64, this would have been a potential info leak, as
> >> SYSCALL followed closely by a hardware interrupt would cause *kernel*
> >> values to land in task_pt_regs().  I don't even want to think about
> >> what this code does if the task is in vm86 mode.  I wouldn't be at all
> >> surprised if non-x86 architectures have all kinds of interesting
> >> thinks happen if you do this to a task that isn't running normal
> >> non-atomic kernel code at the time.
> >>
> >> I would advocate for unconditionally returning zeros in these two stat 
> >> fields.
> >
> > I'd like that a lot.
> >
> > I guess the two things that might theoretically use it are ptrace users
> > and (very theoretically) sampling profiling stuff or so?
> >
> > In gdb, the only code I can find that reads this is in gdb/linux-nat.c, but
> > it's behind an "#ifdef 0":
> >
> >   #if 0   /* Don't know how architecture-dependent the rest is...
> >  Anyway the signal bitmap info is available from "status".  */
> > if (fscanf (procfile, "%lu ", ) > 0) /* FIXME arch?  */
> >   printf_filtered (_("Kernel stack pointer: 0x%lx\n"), ltmp);
> > if (fscanf (procfile, "%lu ", ) > 0) /* FIXME arch?  */
> >   printf_filtered (_("Kernel instr pointer: 0x%lx\n"), ltmp);
> >   [...]
> >
> > strace and ltrace don't seem to be using it.
> 
> Does CRIU use this? I wouldn't expect so, since they're using ptrace,
> IIUC, to freeze/restore.

As far as I can tell:

parse_pid_stat() parses them into a struct proc_pid_stat as "esp" and "eip",
but those struct members are never used (like, probably, most other members
of that struct).

child_opened_proc.c just opens /proc/%d/stat and then closes it again
immediately.

So in summary: I don't think so.


signature.asc
Description: Digital signature


Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-23 Thread Kees Cook
On Fri, Sep 23, 2016 at 12:43 AM, Jann Horn  wrote:
> On Thu, Sep 22, 2016 at 03:44:37PM -0700, Andy Lutomirski wrote:
>> On Fri, Sep 16, 2016 at 7:00 PM, Jann Horn  wrote:
>> > On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote:
>> >> This will prevent a crash if get_wchan() runs after the task stack
>> >> is freed.
>> >
>> > I think I found some more stuff. Have a look at KSTK_EIP() and KSTK_ESP(), 
>> > I think
>> > they read from the saved userspace registers area at the top of the kernel 
>> > stack?
>> >
>> > Used on remote processes in:
>> >   vma_is_stack_for_task() (via /proc/$pid/maps)
>>
>> This isn't used in /proc/$pid/maps -- it's only used in
>> /proc/$pid/task/$tid/maps.  I wonder if anyone actually cares about it
>> -- it certainly won't work reliably.
>>
>> I could pin the stack in vma_is_stack_for_task, but it seems
>> potentially better to me to change it to vma_is_stack_for_current()
>> and remove the offending caller in /proc, replacing it with "return
>> 0".  Thoughts?
>
> I just scrolled through the debian codesearch results for "\[stack\]" -
> there seem to only be 105 across all of debian's packages, many of them
> duplicates - and I didn't see any that looked like they used the tid map.
> So I think this might work.
>
> ( https://codesearch.debian.net/search?q=%22%5C%5Bstack%5C%5D%22 )
>
>
>> >   do_task_stat() (/proc/$pid/stat)
>>
>> Like this:
>>
>> mm = get_task_mm(task);
>> if (mm) {
>> vsize = task_vsize(mm);
>> if (permitted) {
>> eip = KSTK_EIP(task);
>> esp = KSTK_ESP(task);
>> }
>> }
>>
>> Can we just delete this outright?  It seems somewhere between mostly
>> and entirely useless, and it also seems dangerous.  Until very
>> recently, on x86_64, this would have been a potential info leak, as
>> SYSCALL followed closely by a hardware interrupt would cause *kernel*
>> values to land in task_pt_regs().  I don't even want to think about
>> what this code does if the task is in vm86 mode.  I wouldn't be at all
>> surprised if non-x86 architectures have all kinds of interesting
>> thinks happen if you do this to a task that isn't running normal
>> non-atomic kernel code at the time.
>>
>> I would advocate for unconditionally returning zeros in these two stat 
>> fields.
>
> I'd like that a lot.
>
> I guess the two things that might theoretically use it are ptrace users
> and (very theoretically) sampling profiling stuff or so?
>
> In gdb, the only code I can find that reads this is in gdb/linux-nat.c, but
> it's behind an "#ifdef 0":
>
>   #if 0   /* Don't know how architecture-dependent the rest is...
>  Anyway the signal bitmap info is available from "status".  */
> if (fscanf (procfile, "%lu ", ) > 0) /* FIXME arch?  */
>   printf_filtered (_("Kernel stack pointer: 0x%lx\n"), ltmp);
> if (fscanf (procfile, "%lu ", ) > 0) /* FIXME arch?  */
>   printf_filtered (_("Kernel instr pointer: 0x%lx\n"), ltmp);
>   [...]
>
> strace and ltrace don't seem to be using it.

Does CRIU use this? I wouldn't expect so, since they're using ptrace,
IIUC, to freeze/restore.

-Kees

-- 
Kees Cook
Nexus Security


Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-23 Thread Kees Cook
On Fri, Sep 23, 2016 at 12:43 AM, Jann Horn  wrote:
> On Thu, Sep 22, 2016 at 03:44:37PM -0700, Andy Lutomirski wrote:
>> On Fri, Sep 16, 2016 at 7:00 PM, Jann Horn  wrote:
>> > On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote:
>> >> This will prevent a crash if get_wchan() runs after the task stack
>> >> is freed.
>> >
>> > I think I found some more stuff. Have a look at KSTK_EIP() and KSTK_ESP(), 
>> > I think
>> > they read from the saved userspace registers area at the top of the kernel 
>> > stack?
>> >
>> > Used on remote processes in:
>> >   vma_is_stack_for_task() (via /proc/$pid/maps)
>>
>> This isn't used in /proc/$pid/maps -- it's only used in
>> /proc/$pid/task/$tid/maps.  I wonder if anyone actually cares about it
>> -- it certainly won't work reliably.
>>
>> I could pin the stack in vma_is_stack_for_task, but it seems
>> potentially better to me to change it to vma_is_stack_for_current()
>> and remove the offending caller in /proc, replacing it with "return
>> 0".  Thoughts?
>
> I just scrolled through the debian codesearch results for "\[stack\]" -
> there seem to only be 105 across all of debian's packages, many of them
> duplicates - and I didn't see any that looked like they used the tid map.
> So I think this might work.
>
> ( https://codesearch.debian.net/search?q=%22%5C%5Bstack%5C%5D%22 )
>
>
>> >   do_task_stat() (/proc/$pid/stat)
>>
>> Like this:
>>
>> mm = get_task_mm(task);
>> if (mm) {
>> vsize = task_vsize(mm);
>> if (permitted) {
>> eip = KSTK_EIP(task);
>> esp = KSTK_ESP(task);
>> }
>> }
>>
>> Can we just delete this outright?  It seems somewhere between mostly
>> and entirely useless, and it also seems dangerous.  Until very
>> recently, on x86_64, this would have been a potential info leak, as
>> SYSCALL followed closely by a hardware interrupt would cause *kernel*
>> values to land in task_pt_regs().  I don't even want to think about
>> what this code does if the task is in vm86 mode.  I wouldn't be at all
>> surprised if non-x86 architectures have all kinds of interesting
>> thinks happen if you do this to a task that isn't running normal
>> non-atomic kernel code at the time.
>>
>> I would advocate for unconditionally returning zeros in these two stat 
>> fields.
>
> I'd like that a lot.
>
> I guess the two things that might theoretically use it are ptrace users
> and (very theoretically) sampling profiling stuff or so?
>
> In gdb, the only code I can find that reads this is in gdb/linux-nat.c, but
> it's behind an "#ifdef 0":
>
>   #if 0   /* Don't know how architecture-dependent the rest is...
>  Anyway the signal bitmap info is available from "status".  */
> if (fscanf (procfile, "%lu ", ) > 0) /* FIXME arch?  */
>   printf_filtered (_("Kernel stack pointer: 0x%lx\n"), ltmp);
> if (fscanf (procfile, "%lu ", ) > 0) /* FIXME arch?  */
>   printf_filtered (_("Kernel instr pointer: 0x%lx\n"), ltmp);
>   [...]
>
> strace and ltrace don't seem to be using it.

Does CRIU use this? I wouldn't expect so, since they're using ptrace,
IIUC, to freeze/restore.

-Kees

-- 
Kees Cook
Nexus Security


Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-23 Thread Jann Horn
On Thu, Sep 22, 2016 at 03:44:37PM -0700, Andy Lutomirski wrote:
> On Fri, Sep 16, 2016 at 7:00 PM, Jann Horn  wrote:
> > On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote:
> >> This will prevent a crash if get_wchan() runs after the task stack
> >> is freed.
> >
> > I think I found some more stuff. Have a look at KSTK_EIP() and KSTK_ESP(), 
> > I think
> > they read from the saved userspace registers area at the top of the kernel 
> > stack?
> >
> > Used on remote processes in:
> >   vma_is_stack_for_task() (via /proc/$pid/maps)
> 
> This isn't used in /proc/$pid/maps -- it's only used in
> /proc/$pid/task/$tid/maps.  I wonder if anyone actually cares about it
> -- it certainly won't work reliably.
> 
> I could pin the stack in vma_is_stack_for_task, but it seems
> potentially better to me to change it to vma_is_stack_for_current()
> and remove the offending caller in /proc, replacing it with "return
> 0".  Thoughts?

I just scrolled through the debian codesearch results for "\[stack\]" -
there seem to only be 105 across all of debian's packages, many of them
duplicates - and I didn't see any that looked like they used the tid map.
So I think this might work.

( https://codesearch.debian.net/search?q=%22%5C%5Bstack%5C%5D%22 )


> >   do_task_stat() (/proc/$pid/stat)
> 
> Like this:
> 
> mm = get_task_mm(task);
> if (mm) {
> vsize = task_vsize(mm);
> if (permitted) {
> eip = KSTK_EIP(task);
> esp = KSTK_ESP(task);
> }
> }
> 
> Can we just delete this outright?  It seems somewhere between mostly
> and entirely useless, and it also seems dangerous.  Until very
> recently, on x86_64, this would have been a potential info leak, as
> SYSCALL followed closely by a hardware interrupt would cause *kernel*
> values to land in task_pt_regs().  I don't even want to think about
> what this code does if the task is in vm86 mode.  I wouldn't be at all
> surprised if non-x86 architectures have all kinds of interesting
> thinks happen if you do this to a task that isn't running normal
> non-atomic kernel code at the time.
> 
> I would advocate for unconditionally returning zeros in these two stat fields.

I'd like that a lot.

I guess the two things that might theoretically use it are ptrace users
and (very theoretically) sampling profiling stuff or so?

In gdb, the only code I can find that reads this is in gdb/linux-nat.c, but
it's behind an "#ifdef 0":

  #if 0   /* Don't know how architecture-dependent the rest is...
 Anyway the signal bitmap info is available from "status".  */
if (fscanf (procfile, "%lu ", ) > 0) /* FIXME arch?  */
  printf_filtered (_("Kernel stack pointer: 0x%lx\n"), ltmp);
if (fscanf (procfile, "%lu ", ) > 0) /* FIXME arch?  */
  printf_filtered (_("Kernel instr pointer: 0x%lx\n"), ltmp);
  [...]

strace and ltrace don't seem to be using it.


signature.asc
Description: Digital signature


Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-23 Thread Jann Horn
On Thu, Sep 22, 2016 at 03:44:37PM -0700, Andy Lutomirski wrote:
> On Fri, Sep 16, 2016 at 7:00 PM, Jann Horn  wrote:
> > On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote:
> >> This will prevent a crash if get_wchan() runs after the task stack
> >> is freed.
> >
> > I think I found some more stuff. Have a look at KSTK_EIP() and KSTK_ESP(), 
> > I think
> > they read from the saved userspace registers area at the top of the kernel 
> > stack?
> >
> > Used on remote processes in:
> >   vma_is_stack_for_task() (via /proc/$pid/maps)
> 
> This isn't used in /proc/$pid/maps -- it's only used in
> /proc/$pid/task/$tid/maps.  I wonder if anyone actually cares about it
> -- it certainly won't work reliably.
> 
> I could pin the stack in vma_is_stack_for_task, but it seems
> potentially better to me to change it to vma_is_stack_for_current()
> and remove the offending caller in /proc, replacing it with "return
> 0".  Thoughts?

I just scrolled through the debian codesearch results for "\[stack\]" -
there seem to only be 105 across all of debian's packages, many of them
duplicates - and I didn't see any that looked like they used the tid map.
So I think this might work.

( https://codesearch.debian.net/search?q=%22%5C%5Bstack%5C%5D%22 )


> >   do_task_stat() (/proc/$pid/stat)
> 
> Like this:
> 
> mm = get_task_mm(task);
> if (mm) {
> vsize = task_vsize(mm);
> if (permitted) {
> eip = KSTK_EIP(task);
> esp = KSTK_ESP(task);
> }
> }
> 
> Can we just delete this outright?  It seems somewhere between mostly
> and entirely useless, and it also seems dangerous.  Until very
> recently, on x86_64, this would have been a potential info leak, as
> SYSCALL followed closely by a hardware interrupt would cause *kernel*
> values to land in task_pt_regs().  I don't even want to think about
> what this code does if the task is in vm86 mode.  I wouldn't be at all
> surprised if non-x86 architectures have all kinds of interesting
> thinks happen if you do this to a task that isn't running normal
> non-atomic kernel code at the time.
> 
> I would advocate for unconditionally returning zeros in these two stat fields.

I'd like that a lot.

I guess the two things that might theoretically use it are ptrace users
and (very theoretically) sampling profiling stuff or so?

In gdb, the only code I can find that reads this is in gdb/linux-nat.c, but
it's behind an "#ifdef 0":

  #if 0   /* Don't know how architecture-dependent the rest is...
 Anyway the signal bitmap info is available from "status".  */
if (fscanf (procfile, "%lu ", ) > 0) /* FIXME arch?  */
  printf_filtered (_("Kernel stack pointer: 0x%lx\n"), ltmp);
if (fscanf (procfile, "%lu ", ) > 0) /* FIXME arch?  */
  printf_filtered (_("Kernel instr pointer: 0x%lx\n"), ltmp);
  [...]

strace and ltrace don't seem to be using it.


signature.asc
Description: Digital signature


Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-22 Thread Andy Lutomirski
On Thu, Sep 22, 2016 at 3:44 PM, Andy Lutomirski  wrote:
> On Fri, Sep 16, 2016 at 7:00 PM, Jann Horn  wrote:
>> On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote:
>>> This will prevent a crash if get_wchan() runs after the task stack
>>> is freed.
>>
>> I think I found some more stuff. Have a look at KSTK_EIP() and KSTK_ESP(), I 
>> think
>> they read from the saved userspace registers area at the top of the kernel 
>> stack?
>>
>> Used on remote processes in:
>>   vma_is_stack_for_task() (via /proc/$pid/maps)
>
> This isn't used in /proc/$pid/maps -- it's only used in
> /proc/$pid/task/$tid/maps.  I wonder if anyone actually cares about it
> -- it certainly won't work reliably.
>
> I could pin the stack in vma_is_stack_for_task, but it seems
> potentially better to me to change it to vma_is_stack_for_current()
> and remove the offending caller in /proc, replacing it with "return
> 0".  Thoughts?

The history here is strange:

Before March 2012, we used to only consider the "stack" of the mm --
that is, the VMA that the mm actually treats as a stack.  Then:

commit b76437579d1344b612cf1851ae610c636cec7db0
Author: Siddhesh Poyarekar 
Date:   Wed Mar 21 16:34:04 2012 -0700

procfs: mark thread stack correctly in proc//maps

and we did something extra horrible to try to find out whose stack was
where.  This got partially reverted by:

commit 65376df582174ffcec9e6471bf5b0dd79ba05e4a
Author: Johannes Weiner 
Date:   Tue Feb 2 16:57:29 2016 -0800

proc: revert /proc//maps [stack:TID] annotation

and now we're in the current situation where it's fast but still racy.

Any objection if I finish reverting the patch and restore the pre-2012
behavior?  Frankly, I wouldn't mind trying to excise KSTK_EIP and
KSTK_ESP from the kernel entirely, to be replaced with
current_user_sp() and current_user_ip().  Having those macros around
seems likely to make people think they're safe to use.

>
>>   do_task_stat() (/proc/$pid/stat)
>
> Like this:
>
> mm = get_task_mm(task);
> if (mm) {
> vsize = task_vsize(mm);
> if (permitted) {
> eip = KSTK_EIP(task);
> esp = KSTK_ESP(task);
> }
> }
>
> Can we just delete this outright?  It seems somewhere between mostly
> and entirely useless, and it also seems dangerous.  Until very
> recently, on x86_64, this would have been a potential info leak, as
> SYSCALL followed closely by a hardware interrupt would cause *kernel*
> values to land in task_pt_regs().  I don't even want to think about
> what this code does if the task is in vm86 mode.  I wouldn't be at all
> surprised if non-x86 architectures have all kinds of interesting
> thinks happen if you do this to a task that isn't running normal
> non-atomic kernel code at the time.
>
> I would advocate for unconditionally returning zeros in these two stat fields.

--Andy


Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-22 Thread Andy Lutomirski
On Thu, Sep 22, 2016 at 3:44 PM, Andy Lutomirski  wrote:
> On Fri, Sep 16, 2016 at 7:00 PM, Jann Horn  wrote:
>> On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote:
>>> This will prevent a crash if get_wchan() runs after the task stack
>>> is freed.
>>
>> I think I found some more stuff. Have a look at KSTK_EIP() and KSTK_ESP(), I 
>> think
>> they read from the saved userspace registers area at the top of the kernel 
>> stack?
>>
>> Used on remote processes in:
>>   vma_is_stack_for_task() (via /proc/$pid/maps)
>
> This isn't used in /proc/$pid/maps -- it's only used in
> /proc/$pid/task/$tid/maps.  I wonder if anyone actually cares about it
> -- it certainly won't work reliably.
>
> I could pin the stack in vma_is_stack_for_task, but it seems
> potentially better to me to change it to vma_is_stack_for_current()
> and remove the offending caller in /proc, replacing it with "return
> 0".  Thoughts?

The history here is strange:

Before March 2012, we used to only consider the "stack" of the mm --
that is, the VMA that the mm actually treats as a stack.  Then:

commit b76437579d1344b612cf1851ae610c636cec7db0
Author: Siddhesh Poyarekar 
Date:   Wed Mar 21 16:34:04 2012 -0700

procfs: mark thread stack correctly in proc//maps

and we did something extra horrible to try to find out whose stack was
where.  This got partially reverted by:

commit 65376df582174ffcec9e6471bf5b0dd79ba05e4a
Author: Johannes Weiner 
Date:   Tue Feb 2 16:57:29 2016 -0800

proc: revert /proc//maps [stack:TID] annotation

and now we're in the current situation where it's fast but still racy.

Any objection if I finish reverting the patch and restore the pre-2012
behavior?  Frankly, I wouldn't mind trying to excise KSTK_EIP and
KSTK_ESP from the kernel entirely, to be replaced with
current_user_sp() and current_user_ip().  Having those macros around
seems likely to make people think they're safe to use.

>
>>   do_task_stat() (/proc/$pid/stat)
>
> Like this:
>
> mm = get_task_mm(task);
> if (mm) {
> vsize = task_vsize(mm);
> if (permitted) {
> eip = KSTK_EIP(task);
> esp = KSTK_ESP(task);
> }
> }
>
> Can we just delete this outright?  It seems somewhere between mostly
> and entirely useless, and it also seems dangerous.  Until very
> recently, on x86_64, this would have been a potential info leak, as
> SYSCALL followed closely by a hardware interrupt would cause *kernel*
> values to land in task_pt_regs().  I don't even want to think about
> what this code does if the task is in vm86 mode.  I wouldn't be at all
> surprised if non-x86 architectures have all kinds of interesting
> thinks happen if you do this to a task that isn't running normal
> non-atomic kernel code at the time.
>
> I would advocate for unconditionally returning zeros in these two stat fields.

--Andy


Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-22 Thread Andy Lutomirski
On Fri, Sep 16, 2016 at 7:00 PM, Jann Horn  wrote:
> On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote:
>> This will prevent a crash if get_wchan() runs after the task stack
>> is freed.
>
> I think I found some more stuff. Have a look at KSTK_EIP() and KSTK_ESP(), I 
> think
> they read from the saved userspace registers area at the top of the kernel 
> stack?
>
> Used on remote processes in:
>   vma_is_stack_for_task() (via /proc/$pid/maps)

This isn't used in /proc/$pid/maps -- it's only used in
/proc/$pid/task/$tid/maps.  I wonder if anyone actually cares about it
-- it certainly won't work reliably.

I could pin the stack in vma_is_stack_for_task, but it seems
potentially better to me to change it to vma_is_stack_for_current()
and remove the offending caller in /proc, replacing it with "return
0".  Thoughts?

>   do_task_stat() (/proc/$pid/stat)

Like this:

mm = get_task_mm(task);
if (mm) {
vsize = task_vsize(mm);
if (permitted) {
eip = KSTK_EIP(task);
esp = KSTK_ESP(task);
}
}

Can we just delete this outright?  It seems somewhere between mostly
and entirely useless, and it also seems dangerous.  Until very
recently, on x86_64, this would have been a potential info leak, as
SYSCALL followed closely by a hardware interrupt would cause *kernel*
values to land in task_pt_regs().  I don't even want to think about
what this code does if the task is in vm86 mode.  I wouldn't be at all
surprised if non-x86 architectures have all kinds of interesting
thinks happen if you do this to a task that isn't running normal
non-atomic kernel code at the time.

I would advocate for unconditionally returning zeros in these two stat fields.


Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-22 Thread Andy Lutomirski
On Fri, Sep 16, 2016 at 7:00 PM, Jann Horn  wrote:
> On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote:
>> This will prevent a crash if get_wchan() runs after the task stack
>> is freed.
>
> I think I found some more stuff. Have a look at KSTK_EIP() and KSTK_ESP(), I 
> think
> they read from the saved userspace registers area at the top of the kernel 
> stack?
>
> Used on remote processes in:
>   vma_is_stack_for_task() (via /proc/$pid/maps)

This isn't used in /proc/$pid/maps -- it's only used in
/proc/$pid/task/$tid/maps.  I wonder if anyone actually cares about it
-- it certainly won't work reliably.

I could pin the stack in vma_is_stack_for_task, but it seems
potentially better to me to change it to vma_is_stack_for_current()
and remove the offending caller in /proc, replacing it with "return
0".  Thoughts?

>   do_task_stat() (/proc/$pid/stat)

Like this:

mm = get_task_mm(task);
if (mm) {
vsize = task_vsize(mm);
if (permitted) {
eip = KSTK_EIP(task);
esp = KSTK_ESP(task);
}
}

Can we just delete this outright?  It seems somewhere between mostly
and entirely useless, and it also seems dangerous.  Until very
recently, on x86_64, this would have been a potential info leak, as
SYSCALL followed closely by a hardware interrupt would cause *kernel*
values to land in task_pt_regs().  I don't even want to think about
what this code does if the task is in vm86 mode.  I wouldn't be at all
surprised if non-x86 architectures have all kinds of interesting
thinks happen if you do this to a task that isn't running normal
non-atomic kernel code at the time.

I would advocate for unconditionally returning zeros in these two stat fields.


Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-16 Thread Jann Horn
On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote:
> This will prevent a crash if get_wchan() runs after the task stack
> is freed.

I think I found some more stuff. Have a look at KSTK_EIP() and KSTK_ESP(), I 
think
they read from the saved userspace registers area at the top of the kernel 
stack?

Used on remote processes in:
  vma_is_stack_for_task() (via /proc/$pid/maps)
  do_task_stat() (/proc/$pid/stat)


signature.asc
Description: Digital signature


Re: [PATCH 09/12] x86/process: Pin the target stack in get_wchan()

2016-09-16 Thread Jann Horn
On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote:
> This will prevent a crash if get_wchan() runs after the task stack
> is freed.

I think I found some more stuff. Have a look at KSTK_EIP() and KSTK_ESP(), I 
think
they read from the saved userspace registers area at the top of the kernel 
stack?

Used on remote processes in:
  vma_is_stack_for_task() (via /proc/$pid/maps)
  do_task_stat() (/proc/$pid/stat)


signature.asc
Description: Digital signature