from:"Pedro Alves"

Re: [RFC PATCH] ptrace: make ptrace() fail if the tracee changed its pid unexpectedly

2020-12-19 Thread Pedro Alves

On 12/17/20 11:39 PM, Eric W. Biederman wrote:

>> resume the old leader L, it resumes the post-exec thread T which was
>> actually now stopped in PTHREAD_EVENT_EXEC. In this case the
>> PTHREAD_EVENT_EXEC event is lost, and the tracer can't know that the
>> tracee changed its pid.
> 
> The change seems sensible.  I don't expect this is common but it looks
> painful to deal with if it happens.

Yeah, the debug session becomes completely messed up, as the ptracer has no
idea the process is running a new image, breakpoints were wiped out, and
the post-exec process is resumed without the ptracer having had a chance
to install new breakpoints.  I don't see any way to deal with it without
kernel help.

> 
> Acked-by: "Eric W. Biederman" 
> 
> I am wondering if this should be expanded to all ptrace types for
> consistency.  Or maybe we should set a flag to make this happen for
> all ptrace events.
> 
> It just seems really odd to only worry about missing this event.
> I admit this a threaded PTRACE_EVENT_EXEC is the only event we are
> likely to miss but still.

It's more about the tid stealing than the event itself.  I mean,
we lose the event because the tid changes magically without warning.
An exec is the only scenario where this happens.

> 
> Do you by any chance have any debugger/strace test cases?
> 
> I would think that would be the way to test to see if this breaks
> anything.  I think I remember strace having a good test suite.

I ran the GDB testsuite against this, no regressions showed up.

BTW, the problem was discovered by Simon Marchi when he tried to write
a GDB testcase for a multi-threaded exec scenario:

 https://sourceware.org/bugzilla/show_bug.cgi?id=26754

I've went through GDB's code looking for potential issues with the change and 
whether
it would affect GDBs already in the wild.  Tricky corner cases abound, but I 
think
we're good.  Feel free to add my ack:

Acked-by: Pedro Alves 

Thanks,
Pedro Alves

Re: [PATCH v3 2/3] x86/signal: Rewire the restart_block() syscall to have a constant nr

2016-06-22 Thread Pedro Alves

On 06/21/2016 05:32 PM, Andy Lutomirski wrote:
> On Jun 21, 2016 5:40 AM, "Pedro Alves" <pal...@redhat.com> wrote:

> I didn't try that particular experiment.  But, from that email:
> 
>> After that, GDB can control the stopped inferior. To call function "func1()" 
>> of inferior, GDB need: Step 1, save current values of registers ($rax 
>> 0xfe00(64 bits -512) is cut to 0xfe00(32 bits -512) because 
>> inferior is a 32 bits program).
> 
> That sounds like it may be a gdb bug.  Why does gdb truncate the register?

Because when debugging a 32-bit program, gdb's register cache only
stores 32-bit-wide registers ($eax, $eip, etc., not $rax, etc.)

Let me turn this around:

Why does the kernel care about the upper 32-bit bits of $orig_rax when
the task is in 32-bit mode in the first place?

The 32-bit syscall entry points already only care about $eax, not $rax,
since $rax doesn't exist in real 32-bit CPUs.

Looking at arch/x86/entry/entry_64_compat.S, all three 64-bit x 32-bit syscall
entry points zero-extend $eax already, and then push that as pt_regs->orig_ax.

So if the kernel is giving significance to the higher 32-bits of orig_ax at
32-bit syscall restart time, that very much looks like a kernel bug to me.

> 
> I haven't played with it recently, but, in my experience, gdb seems to
> work quite poorly in mixed-mode situations.  For example, if you
> attach 64-bit gdb to qemu-system-x86_64's gdbserver, boot a 64-bit
> guest, and breakpoint in early 32-bit code, gdb tends to explode
> pretty badly.

Right, but that's a bit of a red herring, and not entirely gdb's fault.  The 
case you
mention happens because qemu does exactly the opposite of what you're 
suggesting below.
It's qemu that changes the remote protocol's register layout (and thus size) in 
the
remote protocol register read/write packets when the kernel changes mode, behind
gdb's back, and gdb errors out because obviously it isn't expecting that.  All 
gdb
knows is that qemu is now sending bogus register read replies, different from 
what
the target description qemu reported on initial remote connection described.

I say "entirely", because gdb has its own share of fault for the remote protocol
not including a some kind of standard mechanism to inform gdb of mode changes.

However, the usual scenario where the program _doesn't_ change mode
during execution, is supported.

> 
> On x86_64, I think gdb should treat CPU state as 64-bit no matter
> what.  The fact that a 32-bit tracee code segment is in use shouldn't
> change much.

It's not as clear or easy as you make it sound, unfortunately.

For normal userspace programs, the current design across
gdb/remote protocol/ptrace/kernel/core dump format/elf formats/ is that
what matters is the program's architecture, not whatever the tracer's arch
is.

Should core dumping dump 64-bit CPU state as well for 32-bit programs?
The current core dump format dumps a 32-bit elf with notes that contain
32-bit registers.  And I think it'd be a bit odd for a 32-bit program to
dump different cores files depending on the bitness of the kernel.

Should a gdb connected to a 64-bit gdbserver that is debugging a 32-bit
program see different registers compared to a gdb that
is connected to a 32-bit gdbserver that is debugging a 32-bit program?
Currently, it doesn't.  The architecture of gdbserver doesn't matter here,
only the tracee's.

> Admittedly the kernel doesn't really help.  There is some questionable
> code involving which regsets to show to ptrace.

I don't know what code you're looking at, but I consider this mandatory reading:

 Roland McGrath on PTRACE_GETREGSET design:
 https://sourceware.org/ml/archer/2010-q3/msg00193.html

  "A caveat about those requests for bi-arch systems.  Unlike other
  ptrace requests, these access the native formats of the tracee
  process, rather than the native formats of the debugger process.
  So, a 64-bit debugger process using PTRACE_GETREGSET on a 32-bit
  tracee process will see the 32-bit layouts (i.e. what would appear
  in an ELF core file if that process dumped one)."

gdb currently uses PTRACE_SETREGS for the general registers, which
means it currently writes those as 64-bit registers.  However, if gdb or any
ptracer restores/writes $eax/$orig_eax using PTRACE_SETREGSET, it's only
going to pass down a 32-bit value, and again it must be the kernel that sign
extends $orig_eax if it wants to interpret it as signed 64-bit internally.

But actually looking at your patch 3, I'm confused, because it seems
to be doing what I'm suggesting?

Thanks,
Pedro Alves

Re: [PATCH v3 2/3] x86/signal: Rewire the restart_block() syscall to have a constant nr

2016-06-22 Thread Pedro Alves

On 06/21/2016 05:32 PM, Andy Lutomirski wrote:
> On Jun 21, 2016 5:40 AM, "Pedro Alves"  wrote:

> I didn't try that particular experiment.  But, from that email:
> 
>> After that, GDB can control the stopped inferior. To call function "func1()" 
>> of inferior, GDB need: Step 1, save current values of registers ($rax 
>> 0xfe00(64 bits -512) is cut to 0xfe00(32 bits -512) because 
>> inferior is a 32 bits program).
> 
> That sounds like it may be a gdb bug.  Why does gdb truncate the register?

Because when debugging a 32-bit program, gdb's register cache only
stores 32-bit-wide registers ($eax, $eip, etc., not $rax, etc.)

Let me turn this around:

Why does the kernel care about the upper 32-bit bits of $orig_rax when
the task is in 32-bit mode in the first place?

The 32-bit syscall entry points already only care about $eax, not $rax,
since $rax doesn't exist in real 32-bit CPUs.

Looking at arch/x86/entry/entry_64_compat.S, all three 64-bit x 32-bit syscall
entry points zero-extend $eax already, and then push that as pt_regs->orig_ax.

So if the kernel is giving significance to the higher 32-bits of orig_ax at
32-bit syscall restart time, that very much looks like a kernel bug to me.

> 
> I haven't played with it recently, but, in my experience, gdb seems to
> work quite poorly in mixed-mode situations.  For example, if you
> attach 64-bit gdb to qemu-system-x86_64's gdbserver, boot a 64-bit
> guest, and breakpoint in early 32-bit code, gdb tends to explode
> pretty badly.

Right, but that's a bit of a red herring, and not entirely gdb's fault.  The 
case you
mention happens because qemu does exactly the opposite of what you're 
suggesting below.
It's qemu that changes the remote protocol's register layout (and thus size) in 
the
remote protocol register read/write packets when the kernel changes mode, behind
gdb's back, and gdb errors out because obviously it isn't expecting that.  All 
gdb
knows is that qemu is now sending bogus register read replies, different from 
what
the target description qemu reported on initial remote connection described.

I say "entirely", because gdb has its own share of fault for the remote protocol
not including a some kind of standard mechanism to inform gdb of mode changes.

However, the usual scenario where the program _doesn't_ change mode
during execution, is supported.

> 
> On x86_64, I think gdb should treat CPU state as 64-bit no matter
> what.  The fact that a 32-bit tracee code segment is in use shouldn't
> change much.

It's not as clear or easy as you make it sound, unfortunately.

For normal userspace programs, the current design across
gdb/remote protocol/ptrace/kernel/core dump format/elf formats/ is that
what matters is the program's architecture, not whatever the tracer's arch
is.

Should core dumping dump 64-bit CPU state as well for 32-bit programs?
The current core dump format dumps a 32-bit elf with notes that contain
32-bit registers.  And I think it'd be a bit odd for a 32-bit program to
dump different cores files depending on the bitness of the kernel.

Should a gdb connected to a 64-bit gdbserver that is debugging a 32-bit
program see different registers compared to a gdb that
is connected to a 32-bit gdbserver that is debugging a 32-bit program?
Currently, it doesn't.  The architecture of gdbserver doesn't matter here,
only the tracee's.

> Admittedly the kernel doesn't really help.  There is some questionable
> code involving which regsets to show to ptrace.

I don't know what code you're looking at, but I consider this mandatory reading:

 Roland McGrath on PTRACE_GETREGSET design:
 https://sourceware.org/ml/archer/2010-q3/msg00193.html

  "A caveat about those requests for bi-arch systems.  Unlike other
  ptrace requests, these access the native formats of the tracee
  process, rather than the native formats of the debugger process.
  So, a 64-bit debugger process using PTRACE_GETREGSET on a 32-bit
  tracee process will see the 32-bit layouts (i.e. what would appear
  in an ELF core file if that process dumped one)."

gdb currently uses PTRACE_SETREGS for the general registers, which
means it currently writes those as 64-bit registers.  However, if gdb or any
ptracer restores/writes $eax/$orig_eax using PTRACE_SETREGSET, it's only
going to pass down a 32-bit value, and again it must be the kernel that sign
extends $orig_eax if it wants to interpret it as signed 64-bit internally.

But actually looking at your patch 3, I'm confused, because it seems
to be doing what I'm suggesting?

Thanks,
Pedro Alves

Re: [PATCH v3 2/3] x86/signal: Rewire the restart_block() syscall to have a constant nr

2016-06-21 Thread Pedro Alves

Hi Andy,

On 06/21/2016 12:39 AM, Andy Lutomirski wrote:
> Suppose a 64-bit task A traces a 32-bit task B.

I gave your x86/ptrace branch a try:

 https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/ptrace

(this looks to be the same patch set.)

Unfortunately, with gdb git master, I still get the
64-bit ptracer x 32-bit ptracee problem:

 (gdb) r
 Starting program: interrupt.32 
 talk to me baby
 ^C

 Program received signal SIGINT, Interrupt.
 0xf7fd9d09 in __kernel_vsyscall ()
 (gdb) p func1 ()
 $1 = 4
 (gdb) c
 Continuing.
 Unknown error 512
 [Inferior 1 (process 2198) exited with code 01]
 (gdb) q

Is this expected?

This is the same testcase as before:

 https://sourceware.org/ml/gdb/2014-05/msg4.html

Thanks,
Pedro Alves

Re: [PATCH v3 2/3] x86/signal: Rewire the restart_block() syscall to have a constant nr

2016-06-21 Thread Pedro Alves

Hi Andy,

On 06/21/2016 12:39 AM, Andy Lutomirski wrote:
> Suppose a 64-bit task A traces a 32-bit task B.

I gave your x86/ptrace branch a try:

 https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/ptrace

(this looks to be the same patch set.)

Unfortunately, with gdb git master, I still get the
64-bit ptracer x 32-bit ptracee problem:

 (gdb) r
 Starting program: interrupt.32 
 talk to me baby
 ^C

 Program received signal SIGINT, Interrupt.
 0xf7fd9d09 in __kernel_vsyscall ()
 (gdb) p func1 ()
 $1 = 4
 (gdb) c
 Continuing.
 Unknown error 512
 [Inferior 1 (process 2198) exited with code 01]
 (gdb) q

Is this expected?

This is the same testcase as before:

 https://sourceware.org/ml/gdb/2014-05/msg4.html

Thanks,
Pedro Alves

Re: [PATCH v2] x86/ptrace: Stop setting TS_COMPAT in ptrace code

2016-06-20 Thread Pedro Alves

On 06/20/2016 08:37 PM, Andy Lutomirski wrote:
> On Mon, Jun 20, 2016 at 9:29 AM, Andy Lutomirski <l...@kernel.org> wrote:
>> Setting TS_COMPAT in ptrace is wrong: if we happen to do it during
>> syscall entry, then we'll confuse seccomp and audit.  (The former
>> isn't a security problem: seccomp is currently entirely insecure if a
>> malicious ptracer is attached.)  As a minimal fix, this patch adds a
>> new flag TS_I386_REGS_POKED that handles the ptrace special case.
>>
>> Cc: Pedro Alves <pal...@redhat.com>
>> Cc: Oleg Nesterov <o...@redhat.com>
>> Cc: Kees Cook <keesc...@chromium.org>
>> Signed-off-by: Andy Lutomirski <l...@kernel.org>
> 
> In case you're interested, my draft followup (definitely not for x86/urgent) 
> is:
> 
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=50d2f2a9fe1b
> 
> Pedro, this appears to pass ptrace-tests.  I need to try the 64-vs-32
> thing, but it's intended to fix it for real.

Awesome, thanks much for working on this!

Thanks,
Pedro Alves

Re: [PATCH v2] x86/ptrace: Stop setting TS_COMPAT in ptrace code

2016-06-20 Thread Pedro Alves

On 06/20/2016 08:37 PM, Andy Lutomirski wrote:
> On Mon, Jun 20, 2016 at 9:29 AM, Andy Lutomirski  wrote:
>> Setting TS_COMPAT in ptrace is wrong: if we happen to do it during
>> syscall entry, then we'll confuse seccomp and audit.  (The former
>> isn't a security problem: seccomp is currently entirely insecure if a
>> malicious ptracer is attached.)  As a minimal fix, this patch adds a
>> new flag TS_I386_REGS_POKED that handles the ptrace special case.
>>
>> Cc: Pedro Alves 
>> Cc: Oleg Nesterov 
>> Cc: Kees Cook 
>> Signed-off-by: Andy Lutomirski 
> 
> In case you're interested, my draft followup (definitely not for x86/urgent) 
> is:
> 
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack=50d2f2a9fe1b
> 
> Pedro, this appears to pass ptrace-tests.  I need to try the 64-vs-32
> thing, but it's intended to fix it for real.

Awesome, thanks much for working on this!

Thanks,
Pedro Alves

Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace

2016-06-20 Thread Pedro Alves

On 06/19/2016 11:09 PM, Andy Lutomirski wrote:
> 
> The latter bit is a mess and is probably broken on current kernels for
> 64-bit gdb attached to a 32-bit process.  (Is it?  All of this stuff
> is a bit of a pain to test.)

The testcase at:

 https://sourceware.org/ml/gdb/2014-05/msg4.html

still fails for me on Fedora 23 with git master gdb.

Nevermind the misleading URL, that's a kernel patch.

 $ gcc -g -m32 interrupt.c -o interrupt.32
 ...
 (gdb) r
 Starting program: /home/pedro/tmp/interrupt.32 
 talk to me baby
 ^C
 Program received signal SIGINT, Interrupt.
 0xf7fd9d49 in __kernel_vsyscall ()
 (gdb) p func1()
 $1 = 4
 (gdb) c
 Continuing.
 Unknown error 512
 [Inferior 1 (process 20252) exited with code 01]
 (gdb) 

That was a 64-bit gdb.

Note it doesn't fail with fedora 23's gdb, because of a 
fedora-local workaround.

Thanks,
Pedro Alves

Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace

2016-06-20 Thread Pedro Alves

On 06/19/2016 11:09 PM, Andy Lutomirski wrote:
> 
> The latter bit is a mess and is probably broken on current kernels for
> 64-bit gdb attached to a 32-bit process.  (Is it?  All of this stuff
> is a bit of a pain to test.)

The testcase at:

 https://sourceware.org/ml/gdb/2014-05/msg4.html

still fails for me on Fedora 23 with git master gdb.

Nevermind the misleading URL, that's a kernel patch.

 $ gcc -g -m32 interrupt.c -o interrupt.32
 ...
 (gdb) r
 Starting program: /home/pedro/tmp/interrupt.32 
 talk to me baby
 ^C
 Program received signal SIGINT, Interrupt.
 0xf7fd9d49 in __kernel_vsyscall ()
 (gdb) p func1()
 $1 = 4
 (gdb) c
 Continuing.
 Unknown error 512
 [Inferior 1 (process 20252) exited with code 01]
 (gdb) 

That was a 64-bit gdb.

Note it doesn't fail with fedora 23's gdb, because of a 
fedora-local workaround.

Thanks,
Pedro Alves

Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace

2016-06-20 Thread Pedro Alves

On 06/18/2016 06:02 PM, Andy Lutomirski wrote:

> Yuck.  I should have dug in to the history.  Why not just
> unconditionally sign-extend eax when set by a 32-bit tracer?

No idea.

> 
> Do you know how to acquire a copy of erestartsys-trap.c?  The old
> links appear to be broken.

That's part of the ptrace testsuite project, still in cvs, though the url 
changed:

 $ https://sourceware.org/systemtap/wiki/utrace/tests

 $ cvs -d :pserver:anoncvs:anon...@sourceware.org:/cvs/systemtap co ptrace-tests

Can't seem to find a cvsweb interface for that.

I think it'd be great to move these to the selftests infrastructure
directly in the kernel tree.  However, nobody's has ever managed to
find energy for that.

> 
> Also, while I have your attention: when gdb restores old state like
> this, does it do it with individual calls to PTRACE_POKEUSER or does
> it use SETREGSET or similar to do it all at once?  I'm asking because
> I have some other code (fsgsbase) that's on hold until I can figure
> out how to keep it from breaking gdb if and when gdb writes to fs and
> fs_base.
> 

It depends on which register you're accessing, and on kernel version.
But on a recent kernel, it should be using PTRACE_SETREGS / PTRACE_SETREGSET,
thus storing a whole register set in one go.  (And it's likely we could
get rid of PTRACE_POKE fallback paths by now.)
To write to the debug registers (dr0-dr7), PTRACE_POKEUSER is always used.

There's code that coordinates with glibc's libthread_db.so that ends
up _reading_ fs_base/gs_base, and gdb uses PTRACE_PEEKUSER for that,
though there's a pending series that changes it, exposing fs_base/gs_base
as just another register in gdb's register cache:

 https://www.sourceware.org/ml/gdb-patches/2015-11/msg00076.html
 https://www.sourceware.org/ml/gdb-patches/2015-11/msg00077.html

Guess that makes fs_base/gs_base user-writable, if the kernel allows it.

Thanks,
Pedro Alves

Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace

2016-06-20 Thread Pedro Alves

On 06/18/2016 06:02 PM, Andy Lutomirski wrote:

> Yuck.  I should have dug in to the history.  Why not just
> unconditionally sign-extend eax when set by a 32-bit tracer?

No idea.

> 
> Do you know how to acquire a copy of erestartsys-trap.c?  The old
> links appear to be broken.

That's part of the ptrace testsuite project, still in cvs, though the url 
changed:

 $ https://sourceware.org/systemtap/wiki/utrace/tests

 $ cvs -d :pserver:anoncvs:anon...@sourceware.org:/cvs/systemtap co ptrace-tests

Can't seem to find a cvsweb interface for that.

I think it'd be great to move these to the selftests infrastructure
directly in the kernel tree.  However, nobody's has ever managed to
find energy for that.

> 
> Also, while I have your attention: when gdb restores old state like
> this, does it do it with individual calls to PTRACE_POKEUSER or does
> it use SETREGSET or similar to do it all at once?  I'm asking because
> I have some other code (fsgsbase) that's on hold until I can figure
> out how to keep it from breaking gdb if and when gdb writes to fs and
> fs_base.
> 

It depends on which register you're accessing, and on kernel version.
But on a recent kernel, it should be using PTRACE_SETREGS / PTRACE_SETREGSET,
thus storing a whole register set in one go.  (And it's likely we could
get rid of PTRACE_POKE fallback paths by now.)
To write to the debug registers (dr0-dr7), PTRACE_POKEUSER is always used.

There's code that coordinates with glibc's libthread_db.so that ends
up _reading_ fs_base/gs_base, and gdb uses PTRACE_PEEKUSER for that,
though there's a pending series that changes it, exposing fs_base/gs_base
as just another register in gdb's register cache:

 https://www.sourceware.org/ml/gdb-patches/2015-11/msg00076.html
 https://www.sourceware.org/ml/gdb-patches/2015-11/msg00077.html

Guess that makes fs_base/gs_base user-writable, if the kernel allows it.

Thanks,
Pedro Alves

Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace

2016-06-18 Thread Pedro Alves

On 06/18/2016 02:55 PM, Pedro Alves wrote:

> This hunk being mentioned in this thread a couple years ago too:
> 
>  https://www.sourceware.org/ml/gdb/2014-04/msg00095.html
> 
> Please don't break this use case ( and fix the one reported in
> that thread :-) ).

BTW, there was a follow up v2 patch to that last url, here:

 [PATCH v2] Fix get ERESTARTSYS with m32 in x86_64 when debug by GDB
 https://sourceware.org/ml/gdb/2014-05/msg4.html

(for some reason, I can only find the ping on the lkml, not the
 original post, though it was apparently CCed.)

Thanks,
Pedro Alves

Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace

2016-06-18 Thread Pedro Alves

On 06/18/2016 02:55 PM, Pedro Alves wrote:

> This hunk being mentioned in this thread a couple years ago too:
> 
>  https://www.sourceware.org/ml/gdb/2014-04/msg00095.html
> 
> Please don't break this use case ( and fix the one reported in
> that thread :-) ).

BTW, there was a follow up v2 patch to that last url, here:

 [PATCH v2] Fix get ERESTARTSYS with m32 in x86_64 when debug by GDB
 https://sourceware.org/ml/gdb/2014-05/msg4.html

(for some reason, I can only find the ping on the lkml, not the
 original post, though it was apparently CCed.)

Thanks,
Pedro Alves

Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace

2016-06-18 Thread Pedro Alves

On 06/18/2016 11:21 AM, Andy Lutomirski wrote:
> A 32-bit tracer can set a tracee's TS_COMPAT flag by poking orig_ax.
> 
> - If the tracee is stopped in a 32-bit syscall, this has no effect
>   as TS_COMPAT will already be set.
> 
> - If the tracee is stopped on entry to a 64-bit syscall, this could
>   cause problems: in_compat_syscall() etc will be out of sync with
>   the actual syscall table in use.  I can imagine this bre aking
>   audit.  (It can't meaningfully break seccomp, as a malicious
>   tracer can already fully bypass seccomp.)  I could also imagine it
>   subtly confusing the implementations of a few syscalls.
> 
>  - If the tracee is stopped in a non-syscall context, then TS_COMPAT
>will end up set in user mode, which isn't supposed to happen.  The results
>are likely to be similar to the 64-bit syscall entry case.
> 
> Fix it by deleting the code.
> 
> Here's my understanding of the previous intent.  Suppose a
> 32-bit tracee makes a syscall that returns one of the -ERESTART
> codes.  A 32-bit tracer saves away its register state.  The tracee
> resumes, returns from the system call, and gets stopped again for a
> non-syscall reason (e.g. a signal).  Then the tracer tries to roll
> back the tracee's state by writing all of the saved registers back.
> 
> The result of this sequence of events will be that the tracee's
> registers' low bits match what they were at the end of the syscall
> but TS_COMPAT will be clear.  This will cause syscall_get_error() to
> return a *positive* number, because we zero-extend registers poked
> by 32-bit tracers instead of sign-extending them.  As a result, with
> this change, syscall restart won't happen, whereas, historically, it
> would have happened.
> 
> As far as I can tell, this corner case isn't very important, and I

I believe it's actually very much very important for gdb, for restoring
the inferior state when the user calls a function in the inferior, with:

 (gdb) print foo()

Some background here:

 
http://linux-kernel.vger.kernel.narkive.com/fqrh4xKK/patch-x86-ptrace-sign-extend-eax-with-orig-eax-0

This hunk being mentioned in this thread a couple years ago too:

 https://www.sourceware.org/ml/gdb/2014-04/msg00095.html

Please don't break this use case ( and fix the one reported in
that thread :-) ).

Thanks,
Pedro Alves

Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace

2016-06-18 Thread Pedro Alves

On 06/18/2016 11:21 AM, Andy Lutomirski wrote:
> A 32-bit tracer can set a tracee's TS_COMPAT flag by poking orig_ax.
> 
> - If the tracee is stopped in a 32-bit syscall, this has no effect
>   as TS_COMPAT will already be set.
> 
> - If the tracee is stopped on entry to a 64-bit syscall, this could
>   cause problems: in_compat_syscall() etc will be out of sync with
>   the actual syscall table in use.  I can imagine this bre aking
>   audit.  (It can't meaningfully break seccomp, as a malicious
>   tracer can already fully bypass seccomp.)  I could also imagine it
>   subtly confusing the implementations of a few syscalls.
> 
>  - If the tracee is stopped in a non-syscall context, then TS_COMPAT
>will end up set in user mode, which isn't supposed to happen.  The results
>are likely to be similar to the 64-bit syscall entry case.
> 
> Fix it by deleting the code.
> 
> Here's my understanding of the previous intent.  Suppose a
> 32-bit tracee makes a syscall that returns one of the -ERESTART
> codes.  A 32-bit tracer saves away its register state.  The tracee
> resumes, returns from the system call, and gets stopped again for a
> non-syscall reason (e.g. a signal).  Then the tracer tries to roll
> back the tracee's state by writing all of the saved registers back.
> 
> The result of this sequence of events will be that the tracee's
> registers' low bits match what they were at the end of the syscall
> but TS_COMPAT will be clear.  This will cause syscall_get_error() to
> return a *positive* number, because we zero-extend registers poked
> by 32-bit tracers instead of sign-extending them.  As a result, with
> this change, syscall restart won't happen, whereas, historically, it
> would have happened.
> 
> As far as I can tell, this corner case isn't very important, and I

I believe it's actually very much very important for gdb, for restoring
the inferior state when the user calls a function in the inferior, with:

 (gdb) print foo()

Some background here:

 
http://linux-kernel.vger.kernel.narkive.com/fqrh4xKK/patch-x86-ptrace-sign-extend-eax-with-orig-eax-0

This hunk being mentioned in this thread a couple years ago too:

 https://www.sourceware.org/ml/gdb/2014-04/msg00095.html

Please don't break this use case ( and fix the one reported in
that thread :-) ).

Thanks,
Pedro Alves

Re: ptrace() hangs on attempt to seize/attach stopped & frozen task

2015-11-19 Thread Pedro Alves

On 11/19/2015 05:47 PM, Oleg Nesterov wrote:
> Thanks Pedro for your email,
> 

>>  918   /* We need to wait for SIGSTOP before being able to make the next
>>  919  ptrace call on this LWP.  */
>>  920   new_lwp->must_set_ptrace_flags = 1;
>>  921
>>  922   if (linux_proc_pid_is_stopped (lwpid))
> 
> This can't happen today. Starting from v3.0 at least.

Eh, interesting.  So right after PTRACE_ATTACH, we either observe
"running" or "ptrace-stopped", but never "job stopped".  Correct?

I've actually just now tried this:

diff --git c/gdb/linux-nat.c w/gdb/linux-nat.c
index 841ec39..42f2b0d 100644
--- c/gdb/linux-nat.c
+++ w/gdb/linux-nat.c
@@ -981,6 +981,7 @@ linux_nat_post_attach_wait (ptid_t ptid, int first, int 
*cloned,
   pid_t new_pid, pid = ptid_get_lwp (ptid);
   int status;

+#if 0
   if (linux_proc_pid_is_stopped (pid))
 {
   if (debug_linux_nat)
@@ -1006,6 +1007,7 @@ linux_nat_post_attach_wait (ptid_t ptid, int first, int 
*cloned,
 (or a higher priority signal, just like normal PTRACE_ATTACH).  */
   ptrace (PTRACE_CONT, pid, 0, 0);
 }
+#endif

   /* Make sure the initial process is stopped.  The user-level threads
  layer might want to poke around in the inferior, and that won't

and sure enough, gdb's test that covers that use case still
passes, on Fedora 20 (Linux 3.19.8).

And given that my Thunderbird crashed while writing this, I had sufficient
time to be sure that a full test run passes cleanly too.  :-P  :-)

>> But maybe not, if we're sure that
>> that when that happens, waitpid returns for the initial
>> PTRACE_ATTACH-induced SIGSTOP.
> 
> Yes. Just you can't assume that watpid(WNOHANG) will succeed. Is it OK?

Yes, assuming the ptracer is guaranteed to get a SIGCHLD to wake it up.

Thanks,
Pedro Alves
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ptrace() hangs on attempt to seize/attach stopped & frozen task

2015-11-19 Thread Pedro Alves

On 11/17/2015 07:34 PM, Oleg Nesterov wrote:
> On 11/16, Tejun Heo wrote:

>>> And perhaps we can simply remove this logic? I forgot why do we hide this
>>> STOPPED -> RUNNING -> TRACED transition from the attaching thread. But the
>>> vague feeling tells me that we discussed this before and perhaps it was me
>>> who suggested to avoid the user-visible change when you introduced this
>>> transition...
>>
>> Heh, it was too long ago for me to remember much. :)
> 
> Same here...
> 
>>> Anyway, now I do not understand why do we want to hide it. Lets consider
>>> the following "test-case",
>>>
>>> void test(int pid)
>>> {
>>> kill(pid, SIGSTOP);
>>> waitpid(pid, NULL, WSTOPPED);
>>>
>>> ptrace(PTRACE_ATTACH-OR-PTRACE_SEIZE, pid, 0,0);
>>>
>>> assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
>>> }
>>>
>>> Yes, it will fail if we remove JOBCTL_TRAPPING. But it can equally fail
>>> if SIGCONT comes before ATTACH, so perhaps we do not really care?
>>>
>>> Jan, Pedro, do you think the patch below can break gdb somehow? With this
>>> patch you can never assume that waitpid(WNOHANG) or ptrace(WHATEVER) will
>>> succeed right after PTRACE_ATTACH/PTRACE_SEIZE, even if you know that the
>>> tracee was TASK_STOPPED before attach.

Not sure, because I don't think I fully understand that proposed change.

Both GDB and gdbserver have special processing for attaching to already-stopped
processes.  (and neither use PTRACE_SEIZE yet.)

Here's the gdbserver version:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/gdbserver/linux-low.c;h=41ab510fa4ac5654f101f08efb68e26b5bc5dbd7;hb=HEAD#l903

Copied here for convenience:

 907 linux_attach_lwp (ptid_t ptid)
 908 {
 909   struct lwp_info *new_lwp;
 910   int lwpid = ptid_get_lwp (ptid);
 911
 912   if (ptrace (PTRACE_ATTACH, lwpid, (PTRACE_TYPE_ARG3) 0, 
(PTRACE_TYPE_ARG4) 0)
 913   != 0)
 914 return errno;
 915
 916   new_lwp = add_lwp (ptid);
 917
 918   /* We need to wait for SIGSTOP before being able to make the next
 919  ptrace call on this LWP.  */
 920   new_lwp->must_set_ptrace_flags = 1;
 921
 922   if (linux_proc_pid_is_stopped (lwpid))
 923 {
 924   if (debug_threads)
 925 debug_printf ("Attached to a stopped process\n");
 926
 927   /* The process is definitely stopped.  It is in a job control
 928  stop, unless the kernel predates the TASK_STOPPED /
 929  TASK_TRACED distinction, in which case it might be in a
 930  ptrace stop.  Make sure it is in a ptrace stop; from there we
 931  can kill it, signal it, et cetera.
 932
 933  First make sure there is a pending SIGSTOP.  Since we are
 934  already attached, the process can not transition from stopped
 935  to running without a PTRACE_CONT; so we know this signal will
 936  go into the queue.  The SIGSTOP generated by PTRACE_ATTACH is
 937  probably already in the queue (unless this kernel is old
 938  enough to use TASK_STOPPED for ptrace stops); but since
 939  SIGSTOP is not an RT signal, it can only be queued once.  */
 940   kill_lwp (lwpid, SIGSTOP);
 941
 942   /* Finally, resume the stopped process.  This will deliver the
 943  SIGSTOP (or a higher priority signal, just like normal
 944  PTRACE_ATTACH), which we'll catch later on.  */
 945   ptrace (PTRACE_CONT, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 
0);
 946 }
 947
 948   /* The next time we wait for this LWP we'll see a SIGSTOP as 
PTRACE_ATTACH
 949  brings it to a halt.
 950

linux_proc_pid_is_stopped checks whether the state in /proc/pid/status is "T 
(stopped)".

Here's the equivalent in gdb:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/linux-nat.c;h=841ec3949c37438dfba924d8db6b37ffc416dd29;hb=HEAD#l974

This queuing of a SIGSTOP + PTRACE_CONT was necessary because
otherwise when gdb attaches to a job stopped process, gdb would hang in the 
waitpid
after PTRACE_ATTACH, waiting for the initial SIGSTOP which would never arrive.

If the proposed change makes it so that a new intermediate state can be observed
right after PTRACE_ATTACH, and so linux_proc_pid_is_stopped can return false,
then there's potential for breakage.  But maybe not, if we're sure that
that when that happens, waitpid returns for the initial
PTRACE_ATTACH-induced SIGSTOP.

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ptrace() hangs on attempt to seize/attach stopped & frozen task

2015-11-19 Thread Pedro Alves

On 11/19/2015 05:47 PM, Oleg Nesterov wrote:
> Thanks Pedro for your email,
> 

>>  918   /* We need to wait for SIGSTOP before being able to make the next
>>  919  ptrace call on this LWP.  */
>>  920   new_lwp->must_set_ptrace_flags = 1;
>>  921
>>  922   if (linux_proc_pid_is_stopped (lwpid))
> 
> This can't happen today. Starting from v3.0 at least.

Eh, interesting.  So right after PTRACE_ATTACH, we either observe
"running" or "ptrace-stopped", but never "job stopped".  Correct?

I've actually just now tried this:

diff --git c/gdb/linux-nat.c w/gdb/linux-nat.c
index 841ec39..42f2b0d 100644
--- c/gdb/linux-nat.c
+++ w/gdb/linux-nat.c
@@ -981,6 +981,7 @@ linux_nat_post_attach_wait (ptid_t ptid, int first, int 
*cloned,
   pid_t new_pid, pid = ptid_get_lwp (ptid);
   int status;

+#if 0
   if (linux_proc_pid_is_stopped (pid))
 {
   if (debug_linux_nat)
@@ -1006,6 +1007,7 @@ linux_nat_post_attach_wait (ptid_t ptid, int first, int 
*cloned,
 (or a higher priority signal, just like normal PTRACE_ATTACH).  */
   ptrace (PTRACE_CONT, pid, 0, 0);
 }
+#endif

   /* Make sure the initial process is stopped.  The user-level threads
  layer might want to poke around in the inferior, and that won't

and sure enough, gdb's test that covers that use case still
passes, on Fedora 20 (Linux 3.19.8).

And given that my Thunderbird crashed while writing this, I had sufficient
time to be sure that a full test run passes cleanly too.  :-P  :-)

>> But maybe not, if we're sure that
>> that when that happens, waitpid returns for the initial
>> PTRACE_ATTACH-induced SIGSTOP.
> 
> Yes. Just you can't assume that watpid(WNOHANG) will succeed. Is it OK?

Yes, assuming the ptracer is guaranteed to get a SIGCHLD to wake it up.

Thanks,
Pedro Alves
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ptrace() hangs on attempt to seize/attach stopped & frozen task

2015-11-19 Thread Pedro Alves

On 11/17/2015 07:34 PM, Oleg Nesterov wrote:
> On 11/16, Tejun Heo wrote:

>>> And perhaps we can simply remove this logic? I forgot why do we hide this
>>> STOPPED -> RUNNING -> TRACED transition from the attaching thread. But the
>>> vague feeling tells me that we discussed this before and perhaps it was me
>>> who suggested to avoid the user-visible change when you introduced this
>>> transition...
>>
>> Heh, it was too long ago for me to remember much. :)
> 
> Same here...
> 
>>> Anyway, now I do not understand why do we want to hide it. Lets consider
>>> the following "test-case",
>>>
>>> void test(int pid)
>>> {
>>> kill(pid, SIGSTOP);
>>> waitpid(pid, NULL, WSTOPPED);
>>>
>>> ptrace(PTRACE_ATTACH-OR-PTRACE_SEIZE, pid, 0,0);
>>>
>>> assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
>>> }
>>>
>>> Yes, it will fail if we remove JOBCTL_TRAPPING. But it can equally fail
>>> if SIGCONT comes before ATTACH, so perhaps we do not really care?
>>>
>>> Jan, Pedro, do you think the patch below can break gdb somehow? With this
>>> patch you can never assume that waitpid(WNOHANG) or ptrace(WHATEVER) will
>>> succeed right after PTRACE_ATTACH/PTRACE_SEIZE, even if you know that the
>>> tracee was TASK_STOPPED before attach.

Not sure, because I don't think I fully understand that proposed change.

Both GDB and gdbserver have special processing for attaching to already-stopped
processes.  (and neither use PTRACE_SEIZE yet.)

Here's the gdbserver version:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/gdbserver/linux-low.c;h=41ab510fa4ac5654f101f08efb68e26b5bc5dbd7;hb=HEAD#l903

Copied here for convenience:

 907 linux_attach_lwp (ptid_t ptid)
 908 {
 909   struct lwp_info *new_lwp;
 910   int lwpid = ptid_get_lwp (ptid);
 911
 912   if (ptrace (PTRACE_ATTACH, lwpid, (PTRACE_TYPE_ARG3) 0, 
(PTRACE_TYPE_ARG4) 0)
 913   != 0)
 914 return errno;
 915
 916   new_lwp = add_lwp (ptid);
 917
 918   /* We need to wait for SIGSTOP before being able to make the next
 919  ptrace call on this LWP.  */
 920   new_lwp->must_set_ptrace_flags = 1;
 921
 922   if (linux_proc_pid_is_stopped (lwpid))
 923 {
 924   if (debug_threads)
 925 debug_printf ("Attached to a stopped process\n");
 926
 927   /* The process is definitely stopped.  It is in a job control
 928  stop, unless the kernel predates the TASK_STOPPED /
 929  TASK_TRACED distinction, in which case it might be in a
 930  ptrace stop.  Make sure it is in a ptrace stop; from there we
 931  can kill it, signal it, et cetera.
 932
 933  First make sure there is a pending SIGSTOP.  Since we are
 934  already attached, the process can not transition from stopped
 935  to running without a PTRACE_CONT; so we know this signal will
 936  go into the queue.  The SIGSTOP generated by PTRACE_ATTACH is
 937  probably already in the queue (unless this kernel is old
 938  enough to use TASK_STOPPED for ptrace stops); but since
 939  SIGSTOP is not an RT signal, it can only be queued once.  */
 940   kill_lwp (lwpid, SIGSTOP);
 941
 942   /* Finally, resume the stopped process.  This will deliver the
 943  SIGSTOP (or a higher priority signal, just like normal
 944  PTRACE_ATTACH), which we'll catch later on.  */
 945   ptrace (PTRACE_CONT, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 
0);
 946 }
 947
 948   /* The next time we wait for this LWP we'll see a SIGSTOP as 
PTRACE_ATTACH
 949  brings it to a halt.
 950

linux_proc_pid_is_stopped checks whether the state in /proc/pid/status is "T 
(stopped)".

Here's the equivalent in gdb:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/linux-nat.c;h=841ec3949c37438dfba924d8db6b37ffc416dd29;hb=HEAD#l974

This queuing of a SIGSTOP + PTRACE_CONT was necessary because
otherwise when gdb attaches to a job stopped process, gdb would hang in the 
waitpid
after PTRACE_ATTACH, waiting for the initial SIGSTOP which would never arrive.

If the proposed change makes it so that a new intermediate state can be observed
right after PTRACE_ATTACH, and so linux_proc_pid_is_stopped can return false,
then there's potential for breakage.  But maybe not, if we're sure that
that when that happens, waitpid returns for the initial
PTRACE_ATTACH-induced SIGSTOP.

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] wait/ptrace: always assume __WALL if the child is traced

2015-10-28 Thread Pedro Alves

On 10/28/2015 04:11 PM, Oleg Nesterov wrote:
> On 10/26, Pedro Alves wrote:
>>
>> On 10/25/2015 03:54 PM, Oleg Nesterov wrote:
>>>
>>> In any case, the real question is whether we should change the kernel to
>>> fix the problem, or ask the distros to fix their init's. In the former
>>> case 1/2 looks simpler/safer to me than the change in ptrace_traceme(),
>>> and you seem to agree that 1/2 is not that bad.
>>
>> A risk here seems to be that waitpid will start returning unexpected
>> (thread) PIDs to parent processes,
> 
> I don't see how this change can make the things worse,
> 
>> and it's not unreasonable to assume
>> that e.g., a program asserts that waitpid either returns error or a
>> known (process) PID.
> 
> Well. /sbin/init can never assume this, obviously.

Right.  I was actually thinking of !init processes -- basically code
that spawns helper processes, keeps a data structure indexed by pid, then
discards the structure when the child exits.  Something like:

 pid = waitpid(-1, , 0);
 if (pid > 0)
 {
   struct child_process *child = find_process(pid);
   assert (child != NULL);
 }

As in, before your change, the child could get stuck forever, but after your
change, the parent could die/assert instead.

But ...

> 
>> That's not an init-only issue,
> 
> Yes. Because we have CLONE_PARENT. So "waitpid either returns error or a
> known (process) PID" is only true if you trust your children.

... OK, that's indeed a good point.

>> (Also, in the original test case, if the child gets/raises a signal or execs
>> before exiting, the bash/init/whatever process won't be issuing PTRACE_CONT,
>> and the child will thus end up stuck (though should be SIGKILLable,
>
> Oh, but if it is killable everything is fine. How does this differ from the
> case when, say, you jusr reparent to init and do kill(getpid(), SIGSTOP) ?

The difference is that if the child called PTRACE_TRACEME, then it goes
to ptrace-stop instead and no amount of SIGCONT unstucks it -- the only way
out is force killing.  I agree it's not a major issue as there's a way out
(and thus made it a parens), but I wouldn't call it nice either.

>> All this because PTRACE_TRACEME is broken by design
>
> Heh. I agree. But we can't fix it now.

Perhaps the man page could document it as deprecated, suggesting
PTRACE_ATTACH/PTRACE_SEIZE instead?

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] wait/ptrace: always assume __WALL if the child is traced

2015-10-28 Thread Pedro Alves

On 10/28/2015 04:11 PM, Oleg Nesterov wrote:
> On 10/26, Pedro Alves wrote:
>>
>> On 10/25/2015 03:54 PM, Oleg Nesterov wrote:
>>>
>>> In any case, the real question is whether we should change the kernel to
>>> fix the problem, or ask the distros to fix their init's. In the former
>>> case 1/2 looks simpler/safer to me than the change in ptrace_traceme(),
>>> and you seem to agree that 1/2 is not that bad.
>>
>> A risk here seems to be that waitpid will start returning unexpected
>> (thread) PIDs to parent processes,
> 
> I don't see how this change can make the things worse,
> 
>> and it's not unreasonable to assume
>> that e.g., a program asserts that waitpid either returns error or a
>> known (process) PID.
> 
> Well. /sbin/init can never assume this, obviously.

Right.  I was actually thinking of !init processes -- basically code
that spawns helper processes, keeps a data structure indexed by pid, then
discards the structure when the child exits.  Something like:

 pid = waitpid(-1, , 0);
 if (pid > 0)
 {
   struct child_process *child = find_process(pid);
   assert (child != NULL);
 }

As in, before your change, the child could get stuck forever, but after your
change, the parent could die/assert instead.

But ...

> 
>> That's not an init-only issue,
> 
> Yes. Because we have CLONE_PARENT. So "waitpid either returns error or a
> known (process) PID" is only true if you trust your children.

... OK, that's indeed a good point.

>> (Also, in the original test case, if the child gets/raises a signal or execs
>> before exiting, the bash/init/whatever process won't be issuing PTRACE_CONT,
>> and the child will thus end up stuck (though should be SIGKILLable,
>
> Oh, but if it is killable everything is fine. How does this differ from the
> case when, say, you jusr reparent to init and do kill(getpid(), SIGSTOP) ?

The difference is that if the child called PTRACE_TRACEME, then it goes
to ptrace-stop instead and no amount of SIGCONT unstucks it -- the only way
out is force killing.  I agree it's not a major issue as there's a way out
(and thus made it a parens), but I wouldn't call it nice either.

>> All this because PTRACE_TRACEME is broken by design
>
> Heh. I agree. But we can't fix it now.

Perhaps the man page could document it as deprecated, suggesting
PTRACE_ATTACH/PTRACE_SEIZE instead?

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] wait/ptrace: always assume __WALL if the child is traced

2015-10-26 Thread Pedro Alves

On 10/25/2015 03:54 PM, Oleg Nesterov wrote:
> On 10/22, Denys Vlasenko wrote:
>>
>> On Wed, Oct 21, 2015 at 11:47 PM, Oleg Nesterov  wrote:
>>> On 10/21, Denys Vlasenko wrote:
>>>>
>>>> On 10/21/2015 09:59 PM, Denys Vlasenko wrote:
>>>>> On 10/21/2015 12:31 AM, Andrew Morton wrote:
>>>>>> Well, to fix this a distro needs to roll out a new kernel.  Or a new
>>>>>> init(8).  Is there any reason to believe that distributing/deploying a
>>>>>> new kernel is significantly easier for everyone?  Because fixing init
>>>>>> sounds like a much preferable solution to this problem.
>>>>>
>>>>> People will continue to write new init(8) implementations,
>>>>> and they will miss this obscure case.
>>>>>
>>>>> Before this bug was found, it was considered possible to use
>>>>> a shell script as init process. What now, every shell needs to add
>>>>> __WALL to its waitpids?
>>>
>>> Why not? I think it can safely use __WALL too.
>>
>> Because having any userspace program which can happen to be init,
>> which includes all shells out there in the wild
>> (bash, dash, ash, ksh, zsh, msh, hush,...)
>> learn about __WALL is wrong: apart from this wart, they do not have
>> to use any Linux-specific code. It can all be perfectly legitimate POSIX.
> 
> Yes, this is true. I meant that they could safely use __WALL to, but I
> understand that this change can be painful.
> 
>>> Sure. But people do the things which were never intended to be
>>> used all the time. We simply can not know if this "feature"
>>> already has a creative user or not.
>>
>> It won't be unfixable: they will just have to switch from PTRACE_TRACEME
>> to PTRACE_ATTACH.
>>
>> As of now we do not know any people craz^W creative enough
>> to create a cross between init and strace. If such specimens would
>> materialize, don't they deserve to have to make that change?
> 
> This also applies to people who use bash/whatever as /sbin/init and allow
> the untrusted users to run the exploits ;) I do not know who is more crazy.
> 
> In any case, the real question is whether we should change the kernel to
> fix the problem, or ask the distros to fix their init's. In the former
> case 1/2 looks simpler/safer to me than the change in ptrace_traceme(),
> and you seem to agree that 1/2 is not that bad.

A risk here seems to be that waitpid will start returning unexpected
(thread) PIDs to parent processes, and it's not unreasonable to assume
that e.g., a program asserts that waitpid either returns error or a
known (process) PID.

That's not an init-only issue, but something that might affect any
process that runs a child that happens to decide to
call PTRACE_TRACEME.

The ptrace man page says:

 "A process can initiate a trace by calling fork(2) and having the resulting
 child do a PTRACE_TRACEME, followed (typically) by an execve(2)."

Given that, can we instead make the kernel error out on PTRACE_TRACEME issued
from a non-leader thread?  Then between PTRACE_TRACEME and the parent's
waitpid, __WALL or !__WALL should make no difference.

(Also, in the original test case, if the child gets/raises a signal or execs
before exiting, the bash/init/whatever process won't be issuing PTRACE_CONT,
and the child will thus end up stuck (though should be SIGKILLable,
I believe).  All this because PTRACE_TRACEME is broken by design by making it
be the child's choice whether to be traced...)

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] wait/ptrace: always assume __WALL if the child is traced

2015-10-26 Thread Pedro Alves

On 10/25/2015 03:54 PM, Oleg Nesterov wrote:
> On 10/22, Denys Vlasenko wrote:
>>
>> On Wed, Oct 21, 2015 at 11:47 PM, Oleg Nesterov <o...@redhat.com> wrote:
>>> On 10/21, Denys Vlasenko wrote:
>>>>
>>>> On 10/21/2015 09:59 PM, Denys Vlasenko wrote:
>>>>> On 10/21/2015 12:31 AM, Andrew Morton wrote:
>>>>>> Well, to fix this a distro needs to roll out a new kernel.  Or a new
>>>>>> init(8).  Is there any reason to believe that distributing/deploying a
>>>>>> new kernel is significantly easier for everyone?  Because fixing init
>>>>>> sounds like a much preferable solution to this problem.
>>>>>
>>>>> People will continue to write new init(8) implementations,
>>>>> and they will miss this obscure case.
>>>>>
>>>>> Before this bug was found, it was considered possible to use
>>>>> a shell script as init process. What now, every shell needs to add
>>>>> __WALL to its waitpids?
>>>
>>> Why not? I think it can safely use __WALL too.
>>
>> Because having any userspace program which can happen to be init,
>> which includes all shells out there in the wild
>> (bash, dash, ash, ksh, zsh, msh, hush,...)
>> learn about __WALL is wrong: apart from this wart, they do not have
>> to use any Linux-specific code. It can all be perfectly legitimate POSIX.
> 
> Yes, this is true. I meant that they could safely use __WALL to, but I
> understand that this change can be painful.
> 
>>> Sure. But people do the things which were never intended to be
>>> used all the time. We simply can not know if this "feature"
>>> already has a creative user or not.
>>
>> It won't be unfixable: they will just have to switch from PTRACE_TRACEME
>> to PTRACE_ATTACH.
>>
>> As of now we do not know any people craz^W creative enough
>> to create a cross between init and strace. If such specimens would
>> materialize, don't they deserve to have to make that change?
> 
> This also applies to people who use bash/whatever as /sbin/init and allow
> the untrusted users to run the exploits ;) I do not know who is more crazy.
> 
> In any case, the real question is whether we should change the kernel to
> fix the problem, or ask the distros to fix their init's. In the former
> case 1/2 looks simpler/safer to me than the change in ptrace_traceme(),
> and you seem to agree that 1/2 is not that bad.

A risk here seems to be that waitpid will start returning unexpected
(thread) PIDs to parent processes, and it's not unreasonable to assume
that e.g., a program asserts that waitpid either returns error or a
known (process) PID.

That's not an init-only issue, but something that might affect any
process that runs a child that happens to decide to
call PTRACE_TRACEME.

The ptrace man page says:

 "A process can initiate a trace by calling fork(2) and having the resulting
 child do a PTRACE_TRACEME, followed (typically) by an execve(2)."

Given that, can we instead make the kernel error out on PTRACE_TRACEME issued
from a non-leader thread?  Then between PTRACE_TRACEME and the parent's
waitpid, __WALL or !__WALL should make no difference.

(Also, in the original test case, if the child gets/raises a signal or execs
before exiting, the bash/init/whatever process won't be issuing PTRACE_CONT,
and the child will thus end up stuck (though should be SIGKILLable,
I believe).  All this because PTRACE_TRACEME is broken by design by making it
be the child's choice whether to be traced...)

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] wait/ptrace: always assume __WALL if the child is traced

2015-10-22 Thread Pedro Alves

On 10/20/2015 06:17 PM, Oleg Nesterov wrote:

> Jan, Pedro, could you please confirm this won't break gdb? I tried
> to look into gdb-7.1, and at first glance gdb uses __WCLONE only
> because __WALL doesn't work on older kernels, iow it seems to me
> that gdb actually wants __WALL so this change should be fine.

Right, gdb actually wants __WALL, but it doesn't use it to keep
compatibility with kernels that predate it.

gdb nowadays has an __WALL emulation waitpid wrapper
(alternates __WCLONE+WNOHANG with 0+WNOHANG, blocks
on sigsuspend/SIGCHLD):

 
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/nat/linux-waitpid.c;h=cbcdd95afa9c664993542b0b3851e79fbae4e1df;hb=HEAD#l77

Though it's not used everywhere.  Some older code in the
ptrace backend open codes the "try __WCLONE, then try !__WCLONE."
dance.

Seems like __WALL was added in Linux 2.4; gdb could probably
assume it's available nowadays...

In any case, to make sure existing gdb binaries would still work
with your kernel change, I ran GDB's testsuite with this:

~~
diff --git a/gdb/nat/linux-waitpid.c b/gdb/nat/linux-waitpid.c
index cbcdd95..864ba2e 100644
--- a/gdb/nat/linux-waitpid.c
+++ b/gdb/nat/linux-waitpid.c
@@ -149,3 +149,17 @@ my_waitpid (int pid, int *status, int flags)
   errno = out_errno;
   return ret;
 }
+
+#include 
+
+pid_t
+waitpid (pid_t pid, int *status, int options)
+{
+  static pid_t (*waitpid2) (pid_t pid, int *status, int options) = NULL;
+
+  if (waitpid2 == NULL)
+waitpid2 = dlsym (RTLD_NEXT, "waitpid");
+
+  options |= __WALL;
+  return waitpid2 (pid, status, options);
+}
~~

and got no regressions.  So seems like all would be well from
GDB's perspective.

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] wait/ptrace: always assume __WALL if the child is traced

2015-10-22 Thread Pedro Alves

On 10/20/2015 06:17 PM, Oleg Nesterov wrote:

> Jan, Pedro, could you please confirm this won't break gdb? I tried
> to look into gdb-7.1, and at first glance gdb uses __WCLONE only
> because __WALL doesn't work on older kernels, iow it seems to me
> that gdb actually wants __WALL so this change should be fine.

Right, gdb actually wants __WALL, but it doesn't use it to keep
compatibility with kernels that predate it.

gdb nowadays has an __WALL emulation waitpid wrapper
(alternates __WCLONE+WNOHANG with 0+WNOHANG, blocks
on sigsuspend/SIGCHLD):

 
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/nat/linux-waitpid.c;h=cbcdd95afa9c664993542b0b3851e79fbae4e1df;hb=HEAD#l77

Though it's not used everywhere.  Some older code in the
ptrace backend open codes the "try __WCLONE, then try !__WCLONE."
dance.

Seems like __WALL was added in Linux 2.4; gdb could probably
assume it's available nowadays...

In any case, to make sure existing gdb binaries would still work
with your kernel change, I ran GDB's testsuite with this:

~~
diff --git a/gdb/nat/linux-waitpid.c b/gdb/nat/linux-waitpid.c
index cbcdd95..864ba2e 100644
--- a/gdb/nat/linux-waitpid.c
+++ b/gdb/nat/linux-waitpid.c
@@ -149,3 +149,17 @@ my_waitpid (int pid, int *status, int flags)
   errno = out_errno;
   return ret;
 }
+
+#include 
+
+pid_t
+waitpid (pid_t pid, int *status, int options)
+{
+  static pid_t (*waitpid2) (pid_t pid, int *status, int options) = NULL;
+
+  if (waitpid2 == NULL)
+waitpid2 = dlsym (RTLD_NEXT, "waitpid");
+
+  options |= __WALL;
+  return waitpid2 (pid, status, options);
+}
~~

and got no regressions.  So seems like all would be well from
GDB's perspective.

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 02/10] x86: Compile-time asm code validation

2015-06-12 Thread Pedro Alves

On 06/12/2015 03:10 PM, Josh Poimboeuf wrote:
> On Fri, Jun 12, 2015 at 12:18:16PM +0100, Pedro Alves wrote:
>> On 06/11/2015 03:10 PM, Josh Poimboeuf wrote:
>>
>>> C would definitely make more sense when analyzing object code.  In fact,
>>> asmvalidate is written in C.  But then I guess we'd have to re-implement
>>> the .cfi stuff and populate the DWARF sections manually instead of
>>> letting the assembler do it.
>>
>> Was doing all this directly in the assembler considered?  That is,
>> e.g., add some knob that makes it error/warn in the same conditions
>> you're making the validator catch.  For tail calls, you'd e.g., add
>> some  new ".nonlocal" directive that you'd use to whitelist the
>> following jump.  And then if it's possible run a CFI generator
>> as a separate step over the source, it sounds like it should also
>> be possible to have the assembler do it instead too (again with
>> some new high level directive to trigger/help it).
> 
> In general I think doing these types of things in the assembler would be
> a good idea.  Missing or inaccurate debug data for asm code seems to be
> a common problem for other projects as well.  As Andy pointed out,
> they're doing similar things in musl [1].

Thanks for the pointer.

> So it might be useful to add
> an option to the assembler which validates that the code conforms to
> certain structural rules, and then inserts frame pointer and/or .cfi
> directives.

> That said, the kernel has much more custom features than other projects.
> There are some sneaky macros, like _ASM_EXTABLE and ALTERNATIVE, which
> hide code in various sections.  Unless we're able to somehow either stop
> using these macros or isolate them to a few places, I doubt that such a
> general purpose assembler option would work.

How does the asmvalidator handle these?

> [1] http://www.openwall.com/lists/musl/2015/05/31/5

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 02/10] x86: Compile-time asm code validation

2015-06-12 Thread Pedro Alves

On 06/11/2015 03:10 PM, Josh Poimboeuf wrote:

> C would definitely make more sense when analyzing object code.  In fact,
> asmvalidate is written in C.  But then I guess we'd have to re-implement
> the .cfi stuff and populate the DWARF sections manually instead of
> letting the assembler do it.

Was doing all this directly in the assembler considered?  That is,
e.g., add some knob that makes it error/warn in the same conditions
you're making the validator catch.  For tail calls, you'd e.g., add
some  new ".nonlocal" directive that you'd use to whitelist the
following jump.  And then if it's possible run a CFI generator
as a separate step over the source, it sounds like it should also
be possible to have the assembler do it instead too (again with
some new high level directive to trigger/help it).

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 02/10] x86: Compile-time asm code validation

2015-06-12 Thread Pedro Alves

On 06/11/2015 03:10 PM, Josh Poimboeuf wrote:

 C would definitely make more sense when analyzing object code.  In fact,
 asmvalidate is written in C.  But then I guess we'd have to re-implement
 the .cfi stuff and populate the DWARF sections manually instead of
 letting the assembler do it.

Was doing all this directly in the assembler considered?  That is,
e.g., add some knob that makes it error/warn in the same conditions
you're making the validator catch.  For tail calls, you'd e.g., add
some  new .nonlocal directive that you'd use to whitelist the
following jump.  And then if it's possible run a CFI generator
as a separate step over the source, it sounds like it should also
be possible to have the assembler do it instead too (again with
some new high level directive to trigger/help it).

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 02/10] x86: Compile-time asm code validation

2015-06-12 Thread Pedro Alves

On 06/12/2015 03:10 PM, Josh Poimboeuf wrote:
 On Fri, Jun 12, 2015 at 12:18:16PM +0100, Pedro Alves wrote:
 On 06/11/2015 03:10 PM, Josh Poimboeuf wrote:

 C would definitely make more sense when analyzing object code.  In fact,
 asmvalidate is written in C.  But then I guess we'd have to re-implement
 the .cfi stuff and populate the DWARF sections manually instead of
 letting the assembler do it.

 Was doing all this directly in the assembler considered?  That is,
 e.g., add some knob that makes it error/warn in the same conditions
 you're making the validator catch.  For tail calls, you'd e.g., add
 some  new .nonlocal directive that you'd use to whitelist the
 following jump.  And then if it's possible run a CFI generator
 as a separate step over the source, it sounds like it should also
 be possible to have the assembler do it instead too (again with
 some new high level directive to trigger/help it).
 
 In general I think doing these types of things in the assembler would be
 a good idea.  Missing or inaccurate debug data for asm code seems to be
 a common problem for other projects as well.  As Andy pointed out,
 they're doing similar things in musl [1].

Thanks for the pointer.

 So it might be useful to add
 an option to the assembler which validates that the code conforms to
 certain structural rules, and then inserts frame pointer and/or .cfi
 directives.

 That said, the kernel has much more custom features than other projects.
 There are some sneaky macros, like _ASM_EXTABLE and ALTERNATIVE, which
 hide code in various sections.  Unless we're able to somehow either stop
 using these macros or isolate them to a few places, I doubt that such a
 general purpose assembler option would work.

How does the asmvalidator handle these?

 [1] http://www.openwall.com/lists/musl/2015/05/31/5

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: install_special_mapping && vm_pgoff (Was: vvar, gup && coredump)

2015-03-16 Thread Pedro Alves

Thanks for looking over all this, guys.  Really appreciated.

On 03/16/2015 07:01 PM, Oleg Nesterov wrote:
> is wrong. To verify, I changed show_map_vma() to expose pgoff even if !file,
> but this test-case can show the problem too:

Might be good to add tests like this to selftests/ once all
this is sorted.

Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: install_special_mapping vm_pgoff (Was: vvar, gup coredump)

2015-03-16 Thread Pedro Alves

Thanks for looking over all this, guys.  Really appreciated.

On 03/16/2015 07:01 PM, Oleg Nesterov wrote:
 is wrong. To verify, I changed show_map_vma() to expose pgoff even if !file,
 but this test-case can show the problem too:

Might be good to add tests like this to selftests/ once all
this is sorted.

Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: vvar, gup && coredump

2015-03-12 Thread Pedro Alves

On 03/12/2015 05:46 PM, Oleg Nesterov wrote:
> On 03/12, Oleg Nesterov wrote:
>>
>> Yes, this is true. OK, lets not dump it.
> 
> OTOH. We can probably add ->access() into special_mapping_vmops, this
> way __access_remote_vm() could work even if gup() fails ?
> 
> Jan, Sergio. How much do we want do dump this area ? The change above
> should be justified.

Memory mappings that weren't touched since they were initially mapped can
be retrieved from the program binary and the shared libraries, even if
the core dump is moved to another machine.  However, in vvar case,
sounds like  there's nowhere to read it from offline?  In that case,
it could be justified to dump it.

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: vvar, gup coredump

2015-03-12 Thread Pedro Alves

On 03/12/2015 05:46 PM, Oleg Nesterov wrote:
 On 03/12, Oleg Nesterov wrote:

 Yes, this is true. OK, lets not dump it.
 
 OTOH. We can probably add -access() into special_mapping_vmops, this
 way __access_remote_vm() could work even if gup() fails ?
 
 Jan, Sergio. How much do we want do dump this area ? The change above
 should be justified.

Memory mappings that weren't touched since they were initially mapped can
be retrieved from the program binary and the shared libraries, even if
the core dump is moved to another machine.  However, in vvar case,
sounds like  there's nowhere to read it from offline?  In that case,
it could be justified to dump it.

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] ptrace/x86: fix the TIF_FORCED_TF logic in handle_signal()

2014-11-05 Thread Pedro Alves

On 11/04/2014 11:55 PM, Oleg Nesterov wrote:
> On 11/03, Pedro Alves wrote:
>>
>> Thanks a lot Oleg.
> 
> thanks for the detailed report ;)
> 
>> Question - shouldn't ptrace tests be put in
>> tools/testing/selftests/ptrace/ in the kernel tree nowadays?
> 
> Oh, I do not know. Personally I am not sure that selftests/ptrace/
> makes sense and I did not even know (or forgot) we already have it.

I didn't know about it either until recently.

> To me it would be better to move the single peeksiginfo.c to ptrace
> testsuite and remove this dir.
> 
> That said, I certainly won't argue if you or Jan will maintain
> selftests/ptrace and send the patches with the new test-cases ;)

IMO, having the tests in the kernel tree might make it simpler
to require fixes to come along with tests, so we're better covered
against regressions, but whatever works best to whoever is
maintaining ptrace is good for me.  ;-)

> 
> The only problem is that every new test-case should be justified
> somehow. For example, should we add the test-case from this changelog
> into selftests/ptrace/ ? 

IMO, we should have a ptrace test for this somewhere.  I just don't
know about the intention of selftests/ptrace/, but it looked like
it was meant as a replacement for the out of tree testsuite.
Having the ptrace testsuite in the tree makes sense to me,
for making it easier to require ptrace fixes and extensions to
come along with tests, and to make it easier to catch regressions
closer to the source, and to serve as self-documentation on
expected behavior, but I don't have time to actively pursue
that myself, unfortunately.

> Honestly, I do not know. This bug is minor,
> and probably we do not want a test-case for every bug we fix. 

My view is that there was a bug, and a test case would help
prevent re-introducing it.

> So I'd
> leave this to you, you know how ptrace is actually _used_ and what is
> important for gdb.

Right.  FYI, I've added a testcase to gdb as well, to cover the use
case from a higher level functionality view:

 https://sourceware.org/ml/gdb-patches/2014-10/msg00780.html

(Between a bug/regression being introduced in the kernel and running
the gdb testsuite against that, a lot of time may pass.  Myself,
I usually only test and develop against whatever kernel the
distro ships, which is a released kernel.  And gdb doesn't use
every ptrace feature.)

> 
> The same for other tests in ptrace testsuite. Some of them are really
> "random", in any case (I think) we should not blindly put them all into
> selftests/ptrace. Not to mention the coding style which should be fixed.

Agreed.

> 
> And I know that gdb has a lot of internal tests, and gdb developers
> run them (and ptrace tests) to check the new kernels... Who else do
> you think will use selftests/ptrace?

IMO, whoever touches ptrace code in the kernel should be
required to run the ptrace tests, but not the testsuites of
random other tools that use ptrace (like gdb, strace and others).
Having the ptrace tests in the tree sounded like might make
such a requirement simpler, and I naively thought that the existence
of the directory might mean that that's the direction kernel folks
had agreed on, that's all.

> 
> But again, if you/Jan want to add something into selftests - please
> send a patch. I will even try to review it but only in a sense that
> I will try to convince myself that I understand _what_ this test does,
> not _why_ we need this test-case. You certainly understand "why" much
> better.
> 
> Add Denys, perhaps he also has some tests for strace.

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] ptrace/x86: fix the TIF_FORCED_TF logic in handle_signal()

2014-11-05 Thread Pedro Alves

On 11/04/2014 11:55 PM, Oleg Nesterov wrote:
 On 11/03, Pedro Alves wrote:

 Thanks a lot Oleg.
 
 thanks for the detailed report ;)
 
 Question - shouldn't ptrace tests be put in
 tools/testing/selftests/ptrace/ in the kernel tree nowadays?
 
 Oh, I do not know. Personally I am not sure that selftests/ptrace/
 makes sense and I did not even know (or forgot) we already have it.

I didn't know about it either until recently.

 To me it would be better to move the single peeksiginfo.c to ptrace
 testsuite and remove this dir.
 
 That said, I certainly won't argue if you or Jan will maintain
 selftests/ptrace and send the patches with the new test-cases ;)

IMO, having the tests in the kernel tree might make it simpler
to require fixes to come along with tests, so we're better covered
against regressions, but whatever works best to whoever is
maintaining ptrace is good for me.  ;-)

 
 The only problem is that every new test-case should be justified
 somehow. For example, should we add the test-case from this changelog
 into selftests/ptrace/ ? 

IMO, we should have a ptrace test for this somewhere.  I just don't
know about the intention of selftests/ptrace/, but it looked like
it was meant as a replacement for the out of tree testsuite.
Having the ptrace testsuite in the tree makes sense to me,
for making it easier to require ptrace fixes and extensions to
come along with tests, and to make it easier to catch regressions
closer to the source, and to serve as self-documentation on
expected behavior, but I don't have time to actively pursue
that myself, unfortunately.

 Honestly, I do not know. This bug is minor,
 and probably we do not want a test-case for every bug we fix. 

My view is that there was a bug, and a test case would help
prevent re-introducing it.

 So I'd
 leave this to you, you know how ptrace is actually _used_ and what is
 important for gdb.

Right.  FYI, I've added a testcase to gdb as well, to cover the use
case from a higher level functionality view:

 https://sourceware.org/ml/gdb-patches/2014-10/msg00780.html

(Between a bug/regression being introduced in the kernel and running
the gdb testsuite against that, a lot of time may pass.  Myself,
I usually only test and develop against whatever kernel the
distro ships, which is a released kernel.  And gdb doesn't use
every ptrace feature.)

 
 The same for other tests in ptrace testsuite. Some of them are really
 random, in any case (I think) we should not blindly put them all into
 selftests/ptrace. Not to mention the coding style which should be fixed.

Agreed.

 
 And I know that gdb has a lot of internal tests, and gdb developers
 run them (and ptrace tests) to check the new kernels... Who else do
 you think will use selftests/ptrace?

IMO, whoever touches ptrace code in the kernel should be
required to run the ptrace tests, but not the testsuites of
random other tools that use ptrace (like gdb, strace and others).
Having the ptrace tests in the tree sounded like might make
such a requirement simpler, and I naively thought that the existence
of the directory might mean that that's the direction kernel folks
had agreed on, that's all.

 
 But again, if you/Jan want to add something into selftests - please
 send a patch. I will even try to review it but only in a sense that
 I will try to convince myself that I understand _what_ this test does,
 not _why_ we need this test-case. You certainly understand why much
 better.
 
 Add Denys, perhaps he also has some tests for strace.

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] ptrace/x86: fix the TIF_FORCED_TF logic in handle_signal()

2014-11-03 Thread Pedro Alves

Thanks a lot Oleg.

Question - shouldn't ptrace tests be put in
tools/testing/selftests/ptrace/ in the kernel tree nowadays?

Thanks,
Pedro Alves

On 11/03/2014 08:13 PM, Oleg Nesterov wrote:
> When the TIF_SINGLESTEP tracee dequeues a signal, handle_signal()
> clears TIF_FORCED_TF and X86_EFLAGS_TF but leaves TIF_SINGLESTEP set.
> 
> If the tracer does PTRACE_SINGLESTEP again, enable_single_step() sets
> X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the subsequent
> PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the tracee gets the
> wrong SIGTRAP.
> 
> Test-case (needs -O2 to avoid prologue insns in signal handler):
> 
>   #include 
>   #include 
>   #include 
>   #include 
>   #include 
>   #include 
>   #include 
> 
>   void handler(int n)
>   {
>   asm("nop");
>   }
> 
>   int child(void)
>   {
>   assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
>   signal(SIGALRM, handler);
>   kill(getpid(), SIGALRM);
>   return 0x23;
>   }
> 
>   void *getip(int pid)
>   {
>   return (void*)ptrace(PTRACE_PEEKUSER, pid,
>   offsetof(struct user, regs.rip), 0);
>   }
> 
>   int main(void)
>   {
>   int pid, status;
> 
>   pid = fork();
>   if (!pid)
>   return child();
> 
>   assert(wait() == pid);
>   assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
> 
>   assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
>   assert(wait() == pid);
>   assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
>   assert((getip(pid) - (void*)handler) == 0);
> 
>   assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
>   assert(wait() == pid);
>   assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
>   assert((getip(pid) - (void*)handler) == 1);
> 
>   assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
>   assert(wait() == pid);
>   assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
> 
>   return 0;
>   }
> 
> The last assert() fails because PTRACE_CONT wrongly triggers another
> single-step and X86_EFLAGS_TF can't be cleared by debugger until the
> tracee does sys_rt_sigreturn().
> 
> Change handle_signal() to do user_disable_single_step() if stepping,
> we do not need to preserve TIF_SINGLESTEP because we are going to do
> ptrace_notify(), and it is simply wrong to leak this bit.
> 
> While at it, change the comment to explain why we also need to clear
> TF unconditionally after setup_rt_frame().
> 
> Note: in the longer term we should probably change setup_sigcontext()
> to use get_flags() and then just remove this user_disable_single_step().
> And, the state of TIF_FORCED_TF can be wrong after restore_sigcontext()
> which can set/clear TF, this needs another fix.
> 
> Reported-by: Evan Teran 
> Reported-by: Pedro Alves 
> Signed-off-by: Oleg Nesterov 
> ---
>  arch/x86/kernel/signal.c |   22 +++---
>  1 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index ed37a76..9d3a15b 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -629,7 +629,8 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
>  static void
>  handle_signal(struct ksignal *ksig, struct pt_regs *regs)
>  {
> - bool failed;
> + bool stepping, failed;
> +
>   /* Are we from a system call? */
>   if (syscall_get_nr(current, regs) >= 0) {
>   /* If so, check system call restarting.. */
> @@ -653,12 +654,13 @@ handle_signal(struct ksignal *ksig, struct pt_regs 
> *regs)
>   }
>  
>   /*
> -  * If TF is set due to a debugger (TIF_FORCED_TF), clear the TF
> -  * flag so that register information in the sigcontext is correct.
> +  * If TF is set due to a debugger (TIF_FORCED_TF), clear TF now
> +  * so that register information in the sigcontext is correct and
> +  * then notify the tracer before entering the signal handler.
>*/
> - if (unlikely(regs->flags & X86_EFLAGS_TF) &&
> - likely(test_and_clear_thread_flag(TIF_FORCED_TF)))
> - regs->flags &= ~X86_EFLAGS_TF;
> + stepping = test_thread_flag(TIF_SINGLESTEP);
> + if (stepping)
> + user_disable_single_step(current);
>  
>   failed = (s

Re: [PATCH 1/1] ptrace/x86: fix the TIF_FORCED_TF logic in handle_signal()

2014-11-03 Thread Pedro Alves

Thanks a lot Oleg.

Question - shouldn't ptrace tests be put in
tools/testing/selftests/ptrace/ in the kernel tree nowadays?

Thanks,
Pedro Alves

On 11/03/2014 08:13 PM, Oleg Nesterov wrote:
 When the TIF_SINGLESTEP tracee dequeues a signal, handle_signal()
 clears TIF_FORCED_TF and X86_EFLAGS_TF but leaves TIF_SINGLESTEP set.
 
 If the tracer does PTRACE_SINGLESTEP again, enable_single_step() sets
 X86_EFLAGS_TF but not TIF_FORCED_TF. This means that the subsequent
 PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the tracee gets the
 wrong SIGTRAP.
 
 Test-case (needs -O2 to avoid prologue insns in signal handler):
 
   #include unistd.h
   #include stdio.h
   #include sys/ptrace.h
   #include sys/wait.h
   #include sys/user.h
   #include assert.h
   #include stddef.h
 
   void handler(int n)
   {
   asm(nop);
   }
 
   int child(void)
   {
   assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
   signal(SIGALRM, handler);
   kill(getpid(), SIGALRM);
   return 0x23;
   }
 
   void *getip(int pid)
   {
   return (void*)ptrace(PTRACE_PEEKUSER, pid,
   offsetof(struct user, regs.rip), 0);
   }
 
   int main(void)
   {
   int pid, status;
 
   pid = fork();
   if (!pid)
   return child();
 
   assert(wait(status) == pid);
   assert(WIFSTOPPED(status)  WSTOPSIG(status) == SIGALRM);
 
   assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
   assert(wait(status) == pid);
   assert(WIFSTOPPED(status)  WSTOPSIG(status) == SIGTRAP);
   assert((getip(pid) - (void*)handler) == 0);
 
   assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
   assert(wait(status) == pid);
   assert(WIFSTOPPED(status)  WSTOPSIG(status) == SIGTRAP);
   assert((getip(pid) - (void*)handler) == 1);
 
   assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
   assert(wait(status) == pid);
   assert(WIFEXITED(status)  WEXITSTATUS(status) == 0x23);
 
   return 0;
   }
 
 The last assert() fails because PTRACE_CONT wrongly triggers another
 single-step and X86_EFLAGS_TF can't be cleared by debugger until the
 tracee does sys_rt_sigreturn().
 
 Change handle_signal() to do user_disable_single_step() if stepping,
 we do not need to preserve TIF_SINGLESTEP because we are going to do
 ptrace_notify(), and it is simply wrong to leak this bit.
 
 While at it, change the comment to explain why we also need to clear
 TF unconditionally after setup_rt_frame().
 
 Note: in the longer term we should probably change setup_sigcontext()
 to use get_flags() and then just remove this user_disable_single_step().
 And, the state of TIF_FORCED_TF can be wrong after restore_sigcontext()
 which can set/clear TF, this needs another fix.
 
 Reported-by: Evan Teran ete...@alum.rit.edu
 Reported-by: Pedro Alves pal...@redhat.com
 Signed-off-by: Oleg Nesterov o...@redhat.com
 ---
  arch/x86/kernel/signal.c |   22 +++---
  1 files changed, 11 insertions(+), 11 deletions(-)
 
 diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
 index ed37a76..9d3a15b 100644
 --- a/arch/x86/kernel/signal.c
 +++ b/arch/x86/kernel/signal.c
 @@ -629,7 +629,8 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
  static void
  handle_signal(struct ksignal *ksig, struct pt_regs *regs)
  {
 - bool failed;
 + bool stepping, failed;
 +
   /* Are we from a system call? */
   if (syscall_get_nr(current, regs) = 0) {
   /* If so, check system call restarting.. */
 @@ -653,12 +654,13 @@ handle_signal(struct ksignal *ksig, struct pt_regs 
 *regs)
   }
  
   /*
 -  * If TF is set due to a debugger (TIF_FORCED_TF), clear the TF
 -  * flag so that register information in the sigcontext is correct.
 +  * If TF is set due to a debugger (TIF_FORCED_TF), clear TF now
 +  * so that register information in the sigcontext is correct and
 +  * then notify the tracer before entering the signal handler.
*/
 - if (unlikely(regs-flags  X86_EFLAGS_TF) 
 - likely(test_and_clear_thread_flag(TIF_FORCED_TF)))
 - regs-flags = ~X86_EFLAGS_TF;
 + stepping = test_thread_flag(TIF_SINGLESTEP);
 + if (stepping)
 + user_disable_single_step(current);
  
   failed = (setup_rt_frame(ksig, regs)  0);
   if (!failed) {
 @@ -669,10 +671,8 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
* it might disable possible debug exception from the
* signal handler.
*
 -  * Clear TF when entering the signal handler, but
 -  * notify any tracer that was single-stepping it.
 -  * The tracer may want

Re: [PATCH] elf, coredump: Extract only the active register set during core dump

2014-05-27 Thread Pedro Alves

On 05/23/2014 06:16 AM, Anshuman Khandual wrote:
> Regset active hooks provide a way to query how many registers in the
> register set are active at any point of time. Currently this information
> is being ignored while creating core dump sections corresponding to any
> core note register set. This way the core dump will contain data which are
> not part of the active context of the process and may not be useful. This
> patch will make sure that only the active part of the register set are
> captured during the core dump process which will reduce the core dump
> size.
> 
> Signed-off-by: Anshuman Khandual 
> ---
> NOTE:
> Pedro Alves has mentioned that producing smaller note sections in the core
> dump may break some existing consumers. I request suggestions, reviews and
> test reports on different architectures to prove that this patch does not
> break any existing consumer. Thank you.

Yeah, FYI, I mentioned that after noticing that ia64 does:

mainline/linux-2.6/arch/ia64/kernel/ptrace.c:

 static int
 fpregs_active(struct task_struct *target, const struct user_regset *regset)
 {
 return (target->thread.flags & IA64_THREAD_FPH_VALID) ? 128 : 32;
 }

And it's likely that tools expect fpregset_t to have a fixed size.

include/uapi/linux/elfcore.h:

 22 typedef elf_fpregset_t fpregset_t;

arch/ia64/include/asm/elf.h:

 186 typedef struct ia64_fpreg elf_fpreg_t;
 187 typedef elf_fpreg_t elf_fpregset_t[ELF_NFPREG];

arch/ia64/include/asm/elf.h:

 154 #define ELF_NGREG   128 /* we really need just 72 but let's leave 
some headroom... */
 155 #define ELF_NFPREG  128 /* f0 and f1 could be omitted, but so 
what... */


I haven't done an exhaustive look over ports.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] elf, coredump: Extract only the active register set during core dump

2014-05-27 Thread Pedro Alves

On 05/23/2014 06:16 AM, Anshuman Khandual wrote:
 Regset active hooks provide a way to query how many registers in the
 register set are active at any point of time. Currently this information
 is being ignored while creating core dump sections corresponding to any
 core note register set. This way the core dump will contain data which are
 not part of the active context of the process and may not be useful. This
 patch will make sure that only the active part of the register set are
 captured during the core dump process which will reduce the core dump
 size.
 
 Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com
 ---
 NOTE:
 Pedro Alves has mentioned that producing smaller note sections in the core
 dump may break some existing consumers. I request suggestions, reviews and
 test reports on different architectures to prove that this patch does not
 break any existing consumer. Thank you.

Yeah, FYI, I mentioned that after noticing that ia64 does:

mainline/linux-2.6/arch/ia64/kernel/ptrace.c:

 static int
 fpregs_active(struct task_struct *target, const struct user_regset *regset)
 {
 return (target-thread.flags  IA64_THREAD_FPH_VALID) ? 128 : 32;
 }

And it's likely that tools expect fpregset_t to have a fixed size.

include/uapi/linux/elfcore.h:

 22 typedef elf_fpregset_t fpregset_t;

arch/ia64/include/asm/elf.h:

 186 typedef struct ia64_fpreg elf_fpreg_t;
 187 typedef elf_fpreg_t elf_fpregset_t[ELF_NFPREG];

arch/ia64/include/asm/elf.h:

 154 #define ELF_NGREG   128 /* we really need just 72 but let's leave 
some headroom... */
 155 #define ELF_NFPREG  128 /* f0 and f1 could be omitted, but so 
what... */


I haven't done an exhaustive look over ports.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-20 Thread Pedro Alves

On 05/20/2014 09:14 AM, Anshuman Khandual wrote:
> On 05/19/2014 08:13 PM, Pedro Alves wrote:
>> On 05/19/2014 12:46 PM, Anshuman Khandual wrote:
>>
>>>>> I couldn't actually find any arch that currently returns -ENODEV in
>>>>> the "active" hook.  I see that binfmt_elf.c doesn't handle
>>>>> regset->active() returning < 0.  Guess that may be why.  Looks like
>>>>> something that could be cleaned up, to me.
>>>>>
>>> Also it does not consider the return value of regset->active(t->task, 
>>> regset)
>>> (whose objective is to figure out whether we need to request regset->n 
>>> number
>>> of elements or less than that) in the subsequent call to regset->get 
>>> function.
>>
>> Indeed.
>>
>> TBC, do you plan on fixing this?  Otherwise ...
> 
> Sure, thinking something like this as mentioned below. But still not sure how 
> to use
> the return type of -ENODEV from the function regset->active(). Right now if 
> any
> regset does have the active hook and it returns anything but positive value, 
> it will
> be ignored and the control moves to the next regset in view. This prevents 
> the thread
> core note type being written to the core dump.

Looks to me that that's exactly what should happen for -ENODEV too.  The regset
should be ignored.  If regset->active() returns -ENODEV, then the machine
doesn't have the registers at all, so what makes sense to me is to not write the
corresponding core note in the dump.  IOW, on such a machine, the kernel
generates a core exactly like if the support for these registers that don't
make sense for this machine wasn't compiled in at all.  And generates a core
exactly like an older kernel that didn't know about that regset
(which is fine for that same machine) yet.

> 
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index aa3cb62..80672fb 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1553,7 +1553,15 @@ static int fill_thread_core_info(struct 
> elf_thread_core_info *t,
> if (regset->core_note_type && regset->get &&
> (!regset->active || regset->active(t->task, regset))) {
> int ret;

So, here, this ?

(!regset->active || regset->active(t->task, regset) > 0)) {

> -   size_t size = regset->n * regset->size;
> +   size_t size;
> +
> +   /* Request only the active elements in the regset */
> +   if (!regset->active)
> +   size = regset->n * regset->size;
> +   else
> +   size = regset->active(t->task, regset)
> +   * 
> regset->size;
> +

I wonder if it wouldn't be cleaner to add a function like:

int
regset_active (tast *task, regseg *regset)
{
   if (!regset->active)
return regset->n * regset->size;
   else
return regset->active(task, regset);
}

And then use it like

   if (regset->core_note_type && regset->get) {
   int size = regset_active (t->task, regset);

   if (size > 0) {
  ...
   }

Though at this point, we don't actually make use of
the distinction between -ENODEV vs 0.  Guess that's what
we should be thinking about.  Seems like there some details that
need to be sorted out, and some verification that consumers aren't
broken by outputting smaller notes -- e.g., ia64 makes me
wonder that.

Maybe we should leave this for another day, and have tm_spr_active
return 0 instead of -ENODEV when the machine doesn't have the hardware,
or not install that hook at all.  Seems like the effect will be the same,
as the note isn't output if ->get fails.

> void *data = kmalloc(size, GFP_KERNEL);
> if (unlikely(!data))
> return 0;
> 
>>
>>> Now coming to the installation of the .active hooks part for all the new 
>>> regsets, it
>>> should be pretty straight forward as well. Though its optional and used for 
>>> elf_core_dump
>>> purpose only, its worth adding them here. Example of an active function 
>>> should be something
>>> like this. The function is inexpensive as required.
>>>
>>> +static int tm_spr_active(struct task_struct *target,
>>> +       const struct user_regset *regset)
>>> +{
>>> +   if (!cpu_has_feature(CPU_FTR_TM))
>>> +

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-20 Thread Pedro Alves

On 05/20/2014 09:14 AM, Anshuman Khandual wrote:
 On 05/19/2014 08:13 PM, Pedro Alves wrote:
 On 05/19/2014 12:46 PM, Anshuman Khandual wrote:

 I couldn't actually find any arch that currently returns -ENODEV in
 the active hook.  I see that binfmt_elf.c doesn't handle
 regset-active() returning  0.  Guess that may be why.  Looks like
 something that could be cleaned up, to me.

 Also it does not consider the return value of regset-active(t-task, 
 regset)
 (whose objective is to figure out whether we need to request regset-n 
 number
 of elements or less than that) in the subsequent call to regset-get 
 function.

 Indeed.

 TBC, do you plan on fixing this?  Otherwise ...
 
 Sure, thinking something like this as mentioned below. But still not sure how 
 to use
 the return type of -ENODEV from the function regset-active(). Right now if 
 any
 regset does have the active hook and it returns anything but positive value, 
 it will
 be ignored and the control moves to the next regset in view. This prevents 
 the thread
 core note type being written to the core dump.

Looks to me that that's exactly what should happen for -ENODEV too.  The regset
should be ignored.  If regset-active() returns -ENODEV, then the machine
doesn't have the registers at all, so what makes sense to me is to not write the
corresponding core note in the dump.  IOW, on such a machine, the kernel
generates a core exactly like if the support for these registers that don't
make sense for this machine wasn't compiled in at all.  And generates a core
exactly like an older kernel that didn't know about that regset
(which is fine for that same machine) yet.

 
 diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
 index aa3cb62..80672fb 100644
 --- a/fs/binfmt_elf.c
 +++ b/fs/binfmt_elf.c
 @@ -1553,7 +1553,15 @@ static int fill_thread_core_info(struct 
 elf_thread_core_info *t,
 if (regset-core_note_type  regset-get 
 (!regset-active || regset-active(t-task, regset))) {
 int ret;

So, here, this ?

(!regset-active || regset-active(t-task, regset)  0)) {


 -   size_t size = regset-n * regset-size;
 +   size_t size;
 +
 +   /* Request only the active elements in the regset */
 +   if (!regset-active)
 +   size = regset-n * regset-size;
 +   else
 +   size = regset-active(t-task, regset)
 +   * 
 regset-size;
 +


I wonder if it wouldn't be cleaner to add a function like:

int
regset_active (tast *task, regseg *regset)
{
   if (!regset-active)
return regset-n * regset-size;
   else
return regset-active(task, regset);
}

And then use it like

   if (regset-core_note_type  regset-get) {
   int size = regset_active (t-task, regset);

   if (size  0) {
  ...
   }

Though at this point, we don't actually make use of
the distinction between -ENODEV vs 0.  Guess that's what
we should be thinking about.  Seems like there some details that
need to be sorted out, and some verification that consumers aren't
broken by outputting smaller notes -- e.g., ia64 makes me
wonder that.

Maybe we should leave this for another day, and have tm_spr_active
return 0 instead of -ENODEV when the machine doesn't have the hardware,
or not install that hook at all.  Seems like the effect will be the same,
as the note isn't output if -get fails.

 void *data = kmalloc(size, GFP_KERNEL);
 if (unlikely(!data))
 return 0;
 

 Now coming to the installation of the .active hooks part for all the new 
 regsets, it
 should be pretty straight forward as well. Though its optional and used for 
 elf_core_dump
 purpose only, its worth adding them here. Example of an active function 
 should be something
 like this. The function is inexpensive as required.

 +static int tm_spr_active(struct task_struct *target,
 +   const struct user_regset *regset)
 +{
 +   if (!cpu_has_feature(CPU_FTR_TM))
 +   return -ENODEV;

 ... unfortunately this will do the wrong thing.
 
 I am not sure whether I understand this correctly. Are you saying that its 
 wrong to return
 -ENODEV in this case as above ?

No, sorry for not being clear.  The (...)'s were connected:

   do you plan on fixing this?  Otherwise ... ... unfortunately
this will do the wrong thing.

-- 
Pedro Alves
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-19 Thread Pedro Alves

On 05/19/2014 12:46 PM, Anshuman Khandual wrote:

>> > I couldn't actually find any arch that currently returns -ENODEV in
>> > the "active" hook.  I see that binfmt_elf.c doesn't handle
>> > regset->active() returning < 0.  Guess that may be why.  Looks like
>> > something that could be cleaned up, to me.
>> >
> Also it does not consider the return value of regset->active(t->task, regset)
> (whose objective is to figure out whether we need to request regset->n number
> of elements or less than that) in the subsequent call to regset->get function.

Indeed.

TBC, do you plan on fixing this?  Otherwise ...

> Now coming to the installation of the .active hooks part for all the new 
> regsets, it
> should be pretty straight forward as well. Though its optional and used for 
> elf_core_dump
> purpose only, its worth adding them here. Example of an active function 
> should be something
> like this. The function is inexpensive as required.
> 
> +static int tm_spr_active(struct task_struct *target,
> +   const struct user_regset *regset)
> +{
> +   if (!cpu_has_feature(CPU_FTR_TM))
> +   return -ENODEV;

... unfortunately this will do the wrong thing.

Thanks,
-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-19 Thread Pedro Alves

On 05/19/2014 12:46 PM, Anshuman Khandual wrote:

  I couldn't actually find any arch that currently returns -ENODEV in
  the active hook.  I see that binfmt_elf.c doesn't handle
  regset-active() returning  0.  Guess that may be why.  Looks like
  something that could be cleaned up, to me.
 
 Also it does not consider the return value of regset-active(t-task, regset)
 (whose objective is to figure out whether we need to request regset-n number
 of elements or less than that) in the subsequent call to regset-get function.

Indeed.

TBC, do you plan on fixing this?  Otherwise ...

 Now coming to the installation of the .active hooks part for all the new 
 regsets, it
 should be pretty straight forward as well. Though its optional and used for 
 elf_core_dump
 purpose only, its worth adding them here. Example of an active function 
 should be something
 like this. The function is inexpensive as required.
 
 +static int tm_spr_active(struct task_struct *target,
 +   const struct user_regset *regset)
 +{
 +   if (!cpu_has_feature(CPU_FTR_TM))
 +   return -ENODEV;

... unfortunately this will do the wrong thing.

Thanks,
-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

2014-05-16 Thread Pedro Alves

On 05/15/2014 03:36 PM, Denys Vlasenko wrote:
> On 05/14/2014 08:49 PM, Pedro Alves wrote:
>> I realized now that I responded to this.  Sorry about that.
>>
>> On 01/19/2014 03:29 PM, Oleg Nesterov wrote:
>>> On 01/19, Sergio Durigan Junior wrote:
>>>>
>>>> On Friday, January 10 2014, Oleg Nesterov wrote:
>>>>
>>>>> So suppose that gdb does ptrace(PTRACE_SINGLESTEP) and the tracee
>>>>> executes the "syscall" insn. What it should report?
>>>> [...]
>>>>> But what should syscall-exit do? Should it still report SIGSEGV as
>>>>> it currently does, or should it report _SYSCALL_EXIT instead (if
>>>>> PTRACE_O_SYSCALL_EXIT of course), or should it report both?
>>>>
>>>> Both only if _SYSCALL_EXIT is set.  Otherwise, stick to the current
>>>> behavior, I guess.
>>>
>>> OK, both. In which order? Probably _EXIT first. But this looks a bit
>>> strange. Suppose that the tracee reports _EXIT, then debugger does
>>> ptrace(PTRACE_CONT), should the tracee report SIGTRAP?
>>
>> Seems to me that this should be very much the same as fork/vfork/clone
>> event handling.  Those are triggered by a syscall anyway.  So, say:
>>
>> - ptrace(PTRACE_SINGLESTEP)
>> - the tracee executes the "syscall" insn, and the syscall is "clone".
>> - PTRACE_EVENT_FORK is reported.
>> - The debugger does ptrace(PTRACE_CONT).
>>
>> What should be reported?  What is reported now?
> 
> Yes, PTRACE_SINGLESTEP looks ambiguous. Just like PTRACE_SYSCALL is.

Well, we just need to _define_ what should happen in these cases,
then it's no longer ambiguous.

I tried that out under GDB, using 3.14.3-200.fc20.x86_64 (Fedora 20),
_without_ Sérgio's patches.  Here's what I see.  All this is
with PTRACE_O_TRACEFORK enabled, of course.

#1 - Enter the syscall with PTRACE_SINGLESTEP, then PTRACE_SINGLESTEP

The first PTRACE_SINGLESTEP triggers the PTRACE_EVENT_FORK event.
We see that PTRACE_EVENT_FORK leaves the PC pointing after
the syscall instruction:

Dump of assembler code from 0x373bcbca6a to 0x373bcbca74:
   0x00373bcbca6a <__libc_fork+170>:syscall
=> 0x00373bcbca6c <__libc_fork+172>:cmp$0xf000,%rax
   0x00373bcbca72 <__libc_fork+178>:ja 0x373bcbcbb6 
<__libc_fork+502>

Then after the second PTRACE_SINGLESTEP, the kernel reports a trap with
the PC still pointing at the insn after the "syscall".  IOW, the PC
doesn't move, and the instruction after the "syscall" insn hasn't
been executed yet.  But the PC did move if we consider last time
the thread was resumed while it was running userspace code.  So,
the "syscall" insn is finally fully executed.  I think this is
reasonable.

#2 - Enter the syscall with PTRACE_SINGLESTEP, then PTRACE_CONTINUE

After the PTRACE_SINGLESTEP, the state is the same as in #1.
That is, PC points after "syscall".  The kernel reports no trap
for the latter PTRACE_CONTINUE, the process just continues
free until the next event.

#3 - Enter syscall with PTRACE_CONTINUE, then PTRACE_SINGLESTEP

Like in #1 and #2, we see that PTRACE_EVENT_FORK leaves the
PC pointing after the syscall instruction.  The latter
PTRACE_SINGLESTEP behaves just like in #1.  IOW, we get
a trap, but the PC doesn't move -- the "syscall" insn is
finally fully executed.

#4 - Enter syscall with PTRACE_CONTINUE, then PTRACE_CONTINUE

Same as #2.  IOW, no spurious trap.

So it doesn't really matter whether the process enters
the syscall through PTRACE_SINGLESTEP vs PTRACE_CONTINUE.
What matters is the last resumption.  At least from the
ptracer's perspective, the kernel doesn't "remember" that
the process was PTRACE_SINGLESTEP'd on entry to the syscall.

I think this is the most reasonable behavior, and the
simplest for a ptracer, and very much what I prefer for GDB.

If that last PTRACE_CONTINUE would instead report a SIGTRAP for
the previous PTRACE_SINGLESTEP, then the ptracer would need to
keep track of more state, for no good reason.  If the ptracer
really wants the ptracee to continue single-stepping as if
the fork/vfork/clone wasn't caught with a magic event, then
it just continues issuing PTRACE_SINGLESTEP.

Now I think PTRACE_EVENT_SYSCALL_{ENTER,EXIT} should behave
just the same.  That is, this is the sequence I expect:

  -> PTRACE_SINGLESTEP/PTRACE_CONTINUE over "syscall"
 <- PTRACE_EVENT_SYSCALL_ENTER
-- PC points after "syscall"
  -> PTRACE_SINGLESTEP/PTRACE_CONTINUE
 <- PTRACE_EVENT_FORK
-- PC points after "syscall"
  -> PTRACE_SINGLESTEP/PTRACE_CONTINUE
 <- PTRACE_EVENT_SYSCALL_EXIT
-- PC points after "

Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

2014-05-16 Thread Pedro Alves

On 05/15/2014 03:36 PM, Denys Vlasenko wrote:
 On 05/14/2014 08:49 PM, Pedro Alves wrote:
 I realized now that I responded to this.  Sorry about that.

 On 01/19/2014 03:29 PM, Oleg Nesterov wrote:
 On 01/19, Sergio Durigan Junior wrote:

 On Friday, January 10 2014, Oleg Nesterov wrote:

 So suppose that gdb does ptrace(PTRACE_SINGLESTEP) and the tracee
 executes the syscall insn. What it should report?
 [...]
 But what should syscall-exit do? Should it still report SIGSEGV as
 it currently does, or should it report _SYSCALL_EXIT instead (if
 PTRACE_O_SYSCALL_EXIT of course), or should it report both?

 Both only if _SYSCALL_EXIT is set.  Otherwise, stick to the current
 behavior, I guess.

 OK, both. In which order? Probably _EXIT first. But this looks a bit
 strange. Suppose that the tracee reports _EXIT, then debugger does
 ptrace(PTRACE_CONT), should the tracee report SIGTRAP?

 Seems to me that this should be very much the same as fork/vfork/clone
 event handling.  Those are triggered by a syscall anyway.  So, say:

 - ptrace(PTRACE_SINGLESTEP)
 - the tracee executes the syscall insn, and the syscall is clone.
 - PTRACE_EVENT_FORK is reported.
 - The debugger does ptrace(PTRACE_CONT).

 What should be reported?  What is reported now?
 
 Yes, PTRACE_SINGLESTEP looks ambiguous. Just like PTRACE_SYSCALL is.

Well, we just need to _define_ what should happen in these cases,
then it's no longer ambiguous.

I tried that out under GDB, using 3.14.3-200.fc20.x86_64 (Fedora 20),
_without_ Sérgio's patches.  Here's what I see.  All this is
with PTRACE_O_TRACEFORK enabled, of course.

#1 - Enter the syscall with PTRACE_SINGLESTEP, then PTRACE_SINGLESTEP

The first PTRACE_SINGLESTEP triggers the PTRACE_EVENT_FORK event.
We see that PTRACE_EVENT_FORK leaves the PC pointing after
the syscall instruction:

Dump of assembler code from 0x373bcbca6a to 0x373bcbca74:
   0x00373bcbca6a __libc_fork+170:syscall
= 0x00373bcbca6c __libc_fork+172:cmp$0xf000,%rax
   0x00373bcbca72 __libc_fork+178:ja 0x373bcbcbb6 
__libc_fork+502

Then after the second PTRACE_SINGLESTEP, the kernel reports a trap with
the PC still pointing at the insn after the syscall.  IOW, the PC
doesn't move, and the instruction after the syscall insn hasn't
been executed yet.  But the PC did move if we consider last time
the thread was resumed while it was running userspace code.  So,
the syscall insn is finally fully executed.  I think this is
reasonable.

#2 - Enter the syscall with PTRACE_SINGLESTEP, then PTRACE_CONTINUE

After the PTRACE_SINGLESTEP, the state is the same as in #1.
That is, PC points after syscall.  The kernel reports no trap
for the latter PTRACE_CONTINUE, the process just continues
free until the next event.

#3 - Enter syscall with PTRACE_CONTINUE, then PTRACE_SINGLESTEP

Like in #1 and #2, we see that PTRACE_EVENT_FORK leaves the
PC pointing after the syscall instruction.  The latter
PTRACE_SINGLESTEP behaves just like in #1.  IOW, we get
a trap, but the PC doesn't move -- the syscall insn is
finally fully executed.

#4 - Enter syscall with PTRACE_CONTINUE, then PTRACE_CONTINUE

Same as #2.  IOW, no spurious trap.



So it doesn't really matter whether the process enters
the syscall through PTRACE_SINGLESTEP vs PTRACE_CONTINUE.
What matters is the last resumption.  At least from the
ptracer's perspective, the kernel doesn't remember that
the process was PTRACE_SINGLESTEP'd on entry to the syscall.

I think this is the most reasonable behavior, and the
simplest for a ptracer, and very much what I prefer for GDB.

If that last PTRACE_CONTINUE would instead report a SIGTRAP for
the previous PTRACE_SINGLESTEP, then the ptracer would need to
keep track of more state, for no good reason.  If the ptracer
really wants the ptracee to continue single-stepping as if
the fork/vfork/clone wasn't caught with a magic event, then
it just continues issuing PTRACE_SINGLESTEP.

Now I think PTRACE_EVENT_SYSCALL_{ENTER,EXIT} should behave
just the same.  That is, this is the sequence I expect:

  - PTRACE_SINGLESTEP/PTRACE_CONTINUE over syscall
 - PTRACE_EVENT_SYSCALL_ENTER
-- PC points after syscall
  - PTRACE_SINGLESTEP/PTRACE_CONTINUE
 - PTRACE_EVENT_FORK
-- PC points after syscall
  - PTRACE_SINGLESTEP/PTRACE_CONTINUE
 - PTRACE_EVENT_SYSCALL_EXIT
-- PC points after syscall

And then after, either:

a) PTRACE_SINGLESTEP
 - SIGTRAP
-- PC points after syscall

b) PTRACE_CONTINUE
 - *nothing*


Of course, skip the PTRACE_EVENT_FORK step in the middle
of the sequence if PTRACE_O_TRACEFORK is not enabled.

Thanks,
-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-15 Thread Pedro Alves

On 05/15/2014 09:25 AM, Anshuman Khandual wrote:
> On 05/14/2014 04:45 PM, Pedro Alves wrote:
>> On 05/14/14 06:46, Anshuman Khandual wrote:
>>> On 05/13/2014 10:43 PM, Pedro Alves wrote:
>>>> On 05/05/14 08:54, Anshuman Khandual wrote:
>>>>> This patch enables get and set of transactional memory related register
>>>>> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
>>>>> four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
>>>>> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
>>>>> ELF core note types added previously in this regard.
>>>>>
>>>>>   (1) NT_PPC_TM_SPR
>>>>>   (2) NT_PPC_TM_CGPR
>>>>>   (3) NT_PPC_TM_CFPR
>>>>>   (4) NT_PPC_TM_CVMX
>>>>
>>>> Sorry that I couldn't tell this from the code, but, what does the
>>>> kernel return when the ptracer requests these registers and the
>>>> program is not in a transaction?  Specifically I'm wondering whether
>>>> this follows the same semantics as the s390 port.
>>>>
>>>
>>> Right now, it still returns the saved state of the registers from thread
>>> struct. I had assumed that the user must know the state of the transaction
>>> before initiating the ptrace request. I guess its better to check for
>>> the transaction status before processing the request. In case if TM is not
>>> active on that thread, we should return -EINVAL.
>>
>> I think s390 returns ENODATA in that case.
>>
>>  https://sourceware.org/ml/gdb-patches/2013-06/msg00273.html
>>
>> We'll want some way to tell whether the system actually
>> supports this.  That could be ENODATA vs something-else (EINVAL
>> or perhaps better EIO for "request is invalid").
> 
> As Mickey has pointed out, the transaction memory support in the system can be
> checked from the HWCAP2 flags. So when the transaction is not active, we will
> return ENODATA instead for TM related ptrace regset requests.

Returning ENODATA when the transaction is not active, like
s390 is great.  Thank you.

But I think it's worth it to consider what should the kernel
return when the machine doesn't have these registers at all.

Sure, for this case we happen to have the hwcap flag.  But in
general, I don't know whether we will always have a hwcap bit
for each register set that is added.  Maybe we will, so that
the info ends up in core dumps.

Still, I think it's worth to consider this case in the
general sense, irrespective of hwcap.

That is, what should PTRACE_GETREGSET/PTRACE_SETREGSET return
when the machine doesn't have the registers at all.  We shouldn't
need to consult something elsewhere (like hwcap) to determine
what ENODATA means.  The kernel knows it right there.  I think
s390 goofed here.

Taking a look at x86, for example, we see:

[REGSET_XSTATE] = {
.core_note_type = NT_X86_XSTATE,
.size = sizeof(u64), .align = sizeof(u64),
.active = xstateregs_active, .get = xstateregs_get,
.set = xstateregs_set
},

Note that it installs the ".active" hook.

 24 /**
 25  * user_regset_active_fn - type of @active function in  user_regset
 26  * @target: thread being examined
 27  * @regset: regset being examined
 28  *
 29  * Return -%ENODEV if not available on the hardware found.
 30  * Return %0 if no interesting state in this thread.
 31  * Return >%0 number of @size units of interesting state.
 32  * Any get call fetching state beyond that number will
 33  * see the default initialization state for this data,
 34  * so a caller that knows what the default state is need
 35  * not copy it all out.
 36  * This call is optional; the pointer is %NULL if there
 37  * is no inexpensive check to yield a value < @n.
 38  */
 39 typedef int user_regset_active_fn(struct task_struct *target,
 40   const struct user_regset *regset);
 41

Note the mention of ENODEV.

I couldn't actually find any arch that currently returns -ENODEV in
the "active" hook.  I see that binfmt_elf.c doesn't handle
regset->active() returning < 0.  Guess that may be why.  Looks like
something that could be cleaned up, to me.

Anyway, notice x86's REGSET_XSTATE regset->get hook:

int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
int ret;

if (!cpu_has_xsave)
return -ENODEV;
^^

And then we see that xfpregs_get has a similar ENODEV case.

So in sum, it very much looks like the intention is

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-15 Thread Pedro Alves

On 05/15/2014 09:25 AM, Anshuman Khandual wrote:
 On 05/14/2014 04:45 PM, Pedro Alves wrote:
 On 05/14/14 06:46, Anshuman Khandual wrote:
 On 05/13/2014 10:43 PM, Pedro Alves wrote:
 On 05/05/14 08:54, Anshuman Khandual wrote:
 This patch enables get and set of transactional memory related register
 sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
 four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
 REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
 ELF core note types added previously in this regard.

   (1) NT_PPC_TM_SPR
   (2) NT_PPC_TM_CGPR
   (3) NT_PPC_TM_CFPR
   (4) NT_PPC_TM_CVMX

 Sorry that I couldn't tell this from the code, but, what does the
 kernel return when the ptracer requests these registers and the
 program is not in a transaction?  Specifically I'm wondering whether
 this follows the same semantics as the s390 port.


 Right now, it still returns the saved state of the registers from thread
 struct. I had assumed that the user must know the state of the transaction
 before initiating the ptrace request. I guess its better to check for
 the transaction status before processing the request. In case if TM is not
 active on that thread, we should return -EINVAL.

 I think s390 returns ENODATA in that case.

  https://sourceware.org/ml/gdb-patches/2013-06/msg00273.html

 We'll want some way to tell whether the system actually
 supports this.  That could be ENODATA vs something-else (EINVAL
 or perhaps better EIO for request is invalid).
 
 As Mickey has pointed out, the transaction memory support in the system can be
 checked from the HWCAP2 flags. So when the transaction is not active, we will
 return ENODATA instead for TM related ptrace regset requests.

Returning ENODATA when the transaction is not active, like
s390 is great.  Thank you.

But I think it's worth it to consider what should the kernel
return when the machine doesn't have these registers at all.

Sure, for this case we happen to have the hwcap flag.  But in
general, I don't know whether we will always have a hwcap bit
for each register set that is added.  Maybe we will, so that
the info ends up in core dumps.

Still, I think it's worth to consider this case in the
general sense, irrespective of hwcap.

That is, what should PTRACE_GETREGSET/PTRACE_SETREGSET return
when the machine doesn't have the registers at all.  We shouldn't
need to consult something elsewhere (like hwcap) to determine
what ENODATA means.  The kernel knows it right there.  I think
s390 goofed here.

Taking a look at x86, for example, we see:

[REGSET_XSTATE] = {
.core_note_type = NT_X86_XSTATE,
.size = sizeof(u64), .align = sizeof(u64),
.active = xstateregs_active, .get = xstateregs_get,
.set = xstateregs_set
},

Note that it installs the .active hook.

 24 /**
 25  * user_regset_active_fn - type of @active function in struct user_regset
 26  * @target: thread being examined
 27  * @regset: regset being examined
 28  *
 29  * Return -%ENODEV if not available on the hardware found.
 30  * Return %0 if no interesting state in this thread.
 31  * Return %0 number of @size units of interesting state.
 32  * Any get call fetching state beyond that number will
 33  * see the default initialization state for this data,
 34  * so a caller that knows what the default state is need
 35  * not copy it all out.
 36  * This call is optional; the pointer is %NULL if there
 37  * is no inexpensive check to yield a value  @n.
 38  */
 39 typedef int user_regset_active_fn(struct task_struct *target,
 40   const struct user_regset *regset);
 41

Note the mention of ENODEV.

I couldn't actually find any arch that currently returns -ENODEV in
the active hook.  I see that binfmt_elf.c doesn't handle
regset-active() returning  0.  Guess that may be why.  Looks like
something that could be cleaned up, to me.

Anyway, notice x86's REGSET_XSTATE regset-get hook:

int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
int ret;

if (!cpu_has_xsave)
return -ENODEV;
^^

And then we see that xfpregs_get has a similar ENODEV case.

So in sum, it very much looks like the intention is for
PTRACE_GETREGSET/PTRACE_SETREGSET to return ENODEV in the
case the regset doesn't exist on the running machine, and then
it looks like at least x86 works that way.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

2014-05-14 Thread Pedro Alves

I realized now that I responded to this.  Sorry about that.

On 01/19/2014 03:29 PM, Oleg Nesterov wrote:
> On 01/19, Sergio Durigan Junior wrote:
>>
>> On Friday, January 10 2014, Oleg Nesterov wrote:
>>
>>> So suppose that gdb does ptrace(PTRACE_SINGLESTEP) and the tracee
>>> executes the "syscall" insn. What it should report?
>> [...]
>>> But what should syscall-exit do? Should it still report SIGSEGV as
>>> it currently does, or should it report _SYSCALL_EXIT instead (if
>>> PTRACE_O_SYSCALL_EXIT of course), or should it report both?
>>
>> Both only if _SYSCALL_EXIT is set.  Otherwise, stick to the current
>> behavior, I guess.
> 
> OK, both. In which order? Probably _EXIT first. But this looks a bit
> strange. Suppose that the tracee reports _EXIT, then debugger does
> ptrace(PTRACE_CONT), should the tracee report SIGTRAP?

Seems to me that this should be very much the same as fork/vfork/clone
event handling.  Those are triggered by a syscall anyway.  So, say:

- ptrace(PTRACE_SINGLESTEP)
- the tracee executes the "syscall" insn, and the syscall is "clone".
- PTRACE_EVENT_FORK is reported.
- The debugger does ptrace(PTRACE_CONT).

What should be reported?  What is reported now?

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-14 Thread Pedro Alves

On 05/14/14 12:18, Michael Neuling wrote:
> 
>> s390 actually screwed that, though it got away because
>> there's a bit in HWCAP to signal transactions support.  See:
>>
>>  https://sourceware.org/ml/gdb-patches/2013-11/msg00080.html
>>
>> Are you adding something to HWCAP too?
> 
> Yes but it's in HWCAP2

That's fine.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-14 Thread Pedro Alves

On 05/14/14 06:46, Anshuman Khandual wrote:
> On 05/13/2014 10:43 PM, Pedro Alves wrote:
>> On 05/05/14 08:54, Anshuman Khandual wrote:
>>> This patch enables get and set of transactional memory related register
>>> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
>>> four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
>>> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
>>> ELF core note types added previously in this regard.
>>>
>>> (1) NT_PPC_TM_SPR
>>> (2) NT_PPC_TM_CGPR
>>> (3) NT_PPC_TM_CFPR
>>> (4) NT_PPC_TM_CVMX
>>
>> Sorry that I couldn't tell this from the code, but, what does the
>> kernel return when the ptracer requests these registers and the
>> program is not in a transaction?  Specifically I'm wondering whether
>> this follows the same semantics as the s390 port.
>>
> 
> Right now, it still returns the saved state of the registers from thread
> struct. I had assumed that the user must know the state of the transaction
> before initiating the ptrace request. I guess its better to check for
> the transaction status before processing the request. In case if TM is not
> active on that thread, we should return -EINVAL.

I think s390 returns ENODATA in that case.

 https://sourceware.org/ml/gdb-patches/2013-06/msg00273.html

We'll want some way to tell whether the system actually
supports this.  That could be ENODATA vs something-else (EINVAL
or perhaps better EIO for "request is invalid").

s390 actually screwed that, though it got away because
there's a bit in HWCAP to signal transactions support.  See:

 https://sourceware.org/ml/gdb-patches/2013-11/msg00080.html

Are you adding something to HWCAP too?

> 
> I am not familiar with the s390 side of code. But if we look at the
> s390_tdb_get function it checks for (regs->int_code & 0x200) before
> processing the request. Not sure what 0x200 signifies though.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] ptrace: Clarify PTRACE_GETREGSET/PTRACE_SETREGSET, documentation in uapi header

2014-05-14 Thread Pedro Alves

On 05/14/14 08:10, Anshuman Khandual wrote:
> On 05/13/2014 11:39 PM, Pedro Alves wrote:
>> On 05/05/14 05:10, Anshuman Khandual wrote:
>>> On 05/01/2014 07:43 PM, Pedro Alves wrote:
>> OK, then this is what I suggest instead:
...
>>> Shall I resend the patch with the your proposed changes and your 
>>> "Signed-off-by" and
>>> moving myself as "Reported-by" ?
>>
>> No idea of the actual policy to follow.  Feel free to do that if that's the
>> standard procedure.
> 
> Even I am not sure about this, so to preserve the correct authorship, would 
> you
> mind sending this patch ?

Here you go.  This is against current Linus'.  Please take it from
here if necessary.

8<------
>From 1237f5ac5896f3910f66df83a5093bb548006188 Mon Sep 17 00:00:00 2001
From: Pedro Alves 
Date: Wed, 14 May 2014 11:05:07 +0100
Subject: [PATCH] ptrace: Clarify PTRACE_GETREGSET/PTRACE_SETREGSET
 documentation in uapi header

The current comments don't explicitly state in plain words that
iov.len must be set to the buffer's length prior to the ptrace call.
A user might get confused and leave that uninitialized.

In the ptrace_regset function (snippet below) we see that the buffer
length has to be a multiple of the slot/register size for the given
NT_XXX_TYPE:

if (!regset || (kiov->iov_len % regset->size) != 0)
return -EINVAL;

Note regset->size is the size of each slot/register in the set, not
the size of the whole set.

And then, we see here:

 kiov->iov_len = min(kiov->iov_len,
(__kernel_size_t) (regset->n * regset->size));

that the kernel takes care of capping the requested length to the size
of the whole regset.

Signed-off-by: Pedro Alves 
Reported-by: Anshuman Khandual 
---
 include/uapi/linux/ptrace.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index cf1019e..30836b9 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -39,12 +39,17 @@
  * payload are exactly the same layout.
  *
  * This interface usage is as follows:
- * struct iovec iov = { buf, len};
+ * struct iovec iov = { buf, len };
  *
  * ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, );
  *
- * On the successful completion, iov.len will be updated by the kernel,
- * specifying how much the kernel has written/read to/from the user's iov.buf.
+ * On entry, iov describes the buffer's address and length.  The buffer's 
length
+ * must be a multiple of the size of a single register in the register set.  
The
+ * kernel never reads or writes more than iov.len, and caps the buffer length 
to
+ * the register set's size.  In other words, the kernel reads or writes
+ * min(iov.len, regset size).  On successful completion, iov.len is updated by
+ * the kernel, specifying how much the kernel has read from / written to the
+ * user's iov.buf.
  */
 #define PTRACE_GETREGSET   0x4204
 #define PTRACE_SETREGSET   0x4205
-- 
1.9.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] ptrace: Clarify PTRACE_GETREGSET/PTRACE_SETREGSET, documentation in uapi header

2014-05-14 Thread Pedro Alves

On 05/14/14 08:10, Anshuman Khandual wrote:
 On 05/13/2014 11:39 PM, Pedro Alves wrote:
 On 05/05/14 05:10, Anshuman Khandual wrote:
 On 05/01/2014 07:43 PM, Pedro Alves wrote:
 OK, then this is what I suggest instead:
...
 Shall I resend the patch with the your proposed changes and your 
 Signed-off-by and
 moving myself as Reported-by ?

 No idea of the actual policy to follow.  Feel free to do that if that's the
 standard procedure.
 
 Even I am not sure about this, so to preserve the correct authorship, would 
 you
 mind sending this patch ?

Here you go.  This is against current Linus'.  Please take it from
here if necessary.

8--
From 1237f5ac5896f3910f66df83a5093bb548006188 Mon Sep 17 00:00:00 2001
From: Pedro Alves pal...@redhat.com
Date: Wed, 14 May 2014 11:05:07 +0100
Subject: [PATCH] ptrace: Clarify PTRACE_GETREGSET/PTRACE_SETREGSET
 documentation in uapi header

The current comments don't explicitly state in plain words that
iov.len must be set to the buffer's length prior to the ptrace call.
A user might get confused and leave that uninitialized.

In the ptrace_regset function (snippet below) we see that the buffer
length has to be a multiple of the slot/register size for the given
NT_XXX_TYPE:

if (!regset || (kiov-iov_len % regset-size) != 0)
return -EINVAL;

Note regset-size is the size of each slot/register in the set, not
the size of the whole set.

And then, we see here:

 kiov-iov_len = min(kiov-iov_len,
(__kernel_size_t) (regset-n * regset-size));

that the kernel takes care of capping the requested length to the size
of the whole regset.

Signed-off-by: Pedro Alves pal...@redhat.com
Reported-by: Anshuman Khandual khand...@linux.vnet.ibm.com
---
 include/uapi/linux/ptrace.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index cf1019e..30836b9 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -39,12 +39,17 @@
  * payload are exactly the same layout.
  *
  * This interface usage is as follows:
- * struct iovec iov = { buf, len};
+ * struct iovec iov = { buf, len };
  *
  * ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, iov);
  *
- * On the successful completion, iov.len will be updated by the kernel,
- * specifying how much the kernel has written/read to/from the user's iov.buf.
+ * On entry, iov describes the buffer's address and length.  The buffer's 
length
+ * must be a multiple of the size of a single register in the register set.  
The
+ * kernel never reads or writes more than iov.len, and caps the buffer length 
to
+ * the register set's size.  In other words, the kernel reads or writes
+ * min(iov.len, regset size).  On successful completion, iov.len is updated by
+ * the kernel, specifying how much the kernel has read from / written to the
+ * user's iov.buf.
  */
 #define PTRACE_GETREGSET   0x4204
 #define PTRACE_SETREGSET   0x4205
-- 
1.9.0


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-14 Thread Pedro Alves

On 05/14/14 06:46, Anshuman Khandual wrote:
 On 05/13/2014 10:43 PM, Pedro Alves wrote:
 On 05/05/14 08:54, Anshuman Khandual wrote:
 This patch enables get and set of transactional memory related register
 sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
 four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
 REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
 ELF core note types added previously in this regard.

 (1) NT_PPC_TM_SPR
 (2) NT_PPC_TM_CGPR
 (3) NT_PPC_TM_CFPR
 (4) NT_PPC_TM_CVMX

 Sorry that I couldn't tell this from the code, but, what does the
 kernel return when the ptracer requests these registers and the
 program is not in a transaction?  Specifically I'm wondering whether
 this follows the same semantics as the s390 port.

 
 Right now, it still returns the saved state of the registers from thread
 struct. I had assumed that the user must know the state of the transaction
 before initiating the ptrace request. I guess its better to check for
 the transaction status before processing the request. In case if TM is not
 active on that thread, we should return -EINVAL.

I think s390 returns ENODATA in that case.

 https://sourceware.org/ml/gdb-patches/2013-06/msg00273.html

We'll want some way to tell whether the system actually
supports this.  That could be ENODATA vs something-else (EINVAL
or perhaps better EIO for request is invalid).

s390 actually screwed that, though it got away because
there's a bit in HWCAP to signal transactions support.  See:

 https://sourceware.org/ml/gdb-patches/2013-11/msg00080.html

Are you adding something to HWCAP too?

 
 I am not familiar with the s390 side of code. But if we look at the
 s390_tdb_get function it checks for (regs-int_code  0x200) before
 processing the request. Not sure what 0x200 signifies though.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-14 Thread Pedro Alves

On 05/14/14 12:18, Michael Neuling wrote:
 
 s390 actually screwed that, though it got away because
 there's a bit in HWCAP to signal transactions support.  See:

  https://sourceware.org/ml/gdb-patches/2013-11/msg00080.html

 Are you adding something to HWCAP too?
 
 Yes but it's in HWCAP2

That's fine.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ptrace: Fix PTRACE_GETREGSET/PTRACE_SETREGSET in code documentation

2014-05-13 Thread Pedro Alves

On 05/05/14 05:10, Anshuman Khandual wrote:
> On 05/01/2014 07:43 PM, Pedro Alves wrote:
>> On 04/28/2014 12:00 PM, Anshuman Khandual wrote:
>>> The current documentation is bit misleading and does not explicitly
>>> specify that iov.len need to be initialized failing which kernel
>>> may just ignore the ptrace request and never read from/write into
>>> the user specified buffer. This patch fixes the documentation.
>>
>> Well, it kind of does, here:
>>
>> *  struct iovec iov = { buf, len};
> 
> :) Thats not explicit enough.
> 
>>
>>> @@ -43,8 +43,12 @@
>>>   *
>>>   * ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, );
>>>   *
>>> - * On the successful completion, iov.len will be updated by the kernel,
>>> - * specifying how much the kernel has written/read to/from the user's 
>>> iov.buf.
>>> + * A non-zero value upto the max size of data expected to be written/read 
>>> by the
>>> + * kernel in response to any NT_XXX_TYPE request type must be assigned to 
>>> iov.len
>>> + * before initiating the ptrace call. If iov.len is 0, then kernel will 
>>> neither
>>> + * read from or write into the user buffer specified. On successful 
>>> completion,
>>> + * iov.len will be updated by the kernel, specifying how much the kernel 
>>> has
>>> + * written/read to/from the user's iov.buf.
>>
>> I really appreciate that you're trying to make this clearer, but I
>> find the new sentence very hard to read/reason.  :-/
>>
>> I suggest:
>>
>>  * This interface usage is as follows:
>> - *  struct iovec iov = { buf, len};
>> + *  struct iovec iov = { buf, len };
>>  *
>>  *  ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, 
>> );
>>  *
>> - * On the successful completion, iov.len will be updated by the kernel,
>> - * specifying how much the kernel has written/read to/from the user's 
>> iov.buf.
>> + * On entry, iov describes the buffer's address and length.  The buffer's
>> + * length must be equal to or shorter than the size of the NT_XXX_TYPE 
>> regset.
>> + * On successful completion, iov.len is updated by the kernel, specifying 
>> how
>> + * much the kernel has written/read to/from the user's iov.buf.
>>
> 
> Yeah, sounds better. I may add "If the length is zero, the kernel will 
> neither read
> from or write into the buffer"

Well, I think that much should be obvious.  What's not obvious is
whether that is considered success or error (what is the return code?)
I suspect and expect success return if the regset type is known, and
error otherwise.  So that could be used as a way to probe for support
for a given regset without using stack or heap space, if it ever matters.
The kernel never reads/writes beyond iov.len, so better say that, and
then it automatically gets the 0 case handled too, right?

>> I'm not sure I understood what you're saying correctly, though.  
>> Specifically,
>> I don't know whether the buffer's length must really be shorter than the
>> size of the NT_XXX_TYPE regset.
> 
> No, it does not have to. From the code snippet below (ptrace_regset function)
> the buffer length has to be multiple of regset->size for the given NT_XXX_TYPE
> upto the max regset size for the user to see any valid data.

Ah, I guess one could call it a bug.  If the passed in
len is bigger than the whole register set size, then there seems
to be no point in validating whether the length is multiple of
a single register's size.  That unnecessarily prevents coming up
with a register set in the future that has registers of
different sizes...

But given that that's how things are today, I suppose we should
document it...

 The problem what I
> faced was when you use any iovec structure with the length parameter 
> uninitialized,
> the kernel simply ignores and does not return anything.

Ah.  Well, saying "does not return anything" is quite confusing.  It does
return something -- -EINVAL.

> 
> if (!regset || (kiov->iov_len % regset->size) != 0)
> return -EINVAL;

>  
>>
>>> The current documentation is bit misleading and does not explicitly
>>> specify that iov.len need to be initialized failing which kernel
>>> may just ignore the ptrace request and never read from/write into
>>> the user specified buffer.
>>
>> You're saying that if iov.len is larger than the NT_XXX_TYPE regset,
>> then the kernel returns _success_, but actually doesn't fill the
>> buffer?  That sounds like a bug to me.
> 
> No, I am not

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-13 Thread Pedro Alves

I wonder whether people are getting Roland's address from?

It's frequent that ptrace related patches end up CCed to
rol...@redhat.com, but, he's not been at Red Hat for a few years
now.  Roland, do you still want to be CCed on ptrace-related
issues?  If so, there's probably a script somewhere in the
kernel that needs updating.  If not, well, it'd be good
if it were updated anyway.  :-)

It's a little annoying, as Red Hat's servers outright reject
email sent from a @redhat.com address if one tries to send
an email that includes a CC/FROM to a user that no longer
exists in the @redhat.com domain.

-- 
Pedro Alves

On 05/05/14 08:54, Anshuman Khandual wrote:
> This patch enables get and set of transactional memory related register
> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
> four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
> ELF core note types added previously in this regard.
> 
>   (1) NT_PPC_TM_SPR
>   (2) NT_PPC_TM_CGPR
>   (3) NT_PPC_TM_CFPR
>   (4) NT_PPC_TM_CVMX
> 
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/powerpc/include/asm/switch_to.h |   8 +
>  arch/powerpc/kernel/process.c|  24 ++
>  arch/powerpc/kernel/ptrace.c | 683 
> +--
>  3 files changed, 687 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/switch_to.h 
> b/arch/powerpc/include/asm/switch_to.h
> index 0e83e7d..2737f46 100644
> --- a/arch/powerpc/include/asm/switch_to.h
> +++ b/arch/powerpc/include/asm/switch_to.h
> @@ -80,6 +80,14 @@ static inline void flush_spe_to_thread(struct task_struct 
> *t)
>  }
>  #endif
>  
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +extern void flush_tmregs_to_thread(struct task_struct *);
> +#else
> +static inline void flush_tmregs_to_thread(struct task_struct *t)
> +{
> +}
> +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
> +
>  static inline void clear_task_ebb(struct task_struct *t)
>  {
>  #ifdef CONFIG_PPC_BOOK3S_64
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 31d0215..e247898 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -695,6 +695,30 @@ static inline void __switch_to_tm(struct task_struct 
> *prev)
>   }
>  }
>  
> +void flush_tmregs_to_thread(struct task_struct *tsk)
> +{
> + /*
> +  * If task is not current, it should have been flushed
> +  * already to it's thread_struct during __switch_to().
> +  */
> + if (tsk != current)
> + return;
> +
> + preempt_disable();
> + if (tsk->thread.regs) {
> + /*
> +  * If we are still current, the TM state need to
> +  * be flushed to thread_struct as it will be still
> +  * present in the current cpu.
> +  */
> + if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) {
> + __switch_to_tm(tsk);
> + tm_recheckpoint_new_task(tsk);
> + }
> + }
> + preempt_enable();
> +}
> +
>  /*
>   * This is called if we are on the way out to userspace and the
>   * TIF_RESTORE_TM flag is set.  It checks if we need to reload
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 2e3d2bf..92faded 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -357,6 +357,17 @@ static int gpr_set(struct task_struct *target, const 
> struct user_regset *regset,
>   return ret;
>  }
>  
> +/*
> + * When any transaction is active, "thread_struct->transact_fp" holds
> + * the current running value of all FPR registers and "thread_struct->
> + * fp_state" holds the last checkpointed FPR registers state for the
> + * current transaction.
> + *
> + * struct data {
> + *   u64 fpr[32];
> + *   u64 fpscr;
> + * };
> + */
>  static int fpr_get(struct task_struct *target, const struct user_regset 
> *regset,
>  unsigned int pos, unsigned int count,
>  void *kbuf, void __user *ubuf)
> @@ -365,21 +376,41 @@ static int fpr_get(struct task_struct *target, const 
> struct user_regset *regset,
>   u64 buf[33];
>   int i;
>  #endif
> - flush_fp_to_thread(target);
> + if (MSR_TM_ACTIVE(target->thread.regs->msr)) {
> + flush_fp_to_thread(target);
> + flush_altivec_to_thread(target);
> + flush_tmregs_to_thread(target);
> + } else {
> + flush_fp_to_thread(target);
> + }
>  
>  #ifdef CONFIG_V

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-13 Thread Pedro Alves

On 05/05/14 08:54, Anshuman Khandual wrote:
> This patch enables get and set of transactional memory related register
> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
> four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
> ELF core note types added previously in this regard.
> 
>   (1) NT_PPC_TM_SPR
>   (2) NT_PPC_TM_CGPR
>   (3) NT_PPC_TM_CFPR
>   (4) NT_PPC_TM_CVMX

Sorry that I couldn't tell this from the code, but, what does the
kernel return when the ptracer requests these registers and the
program is not in a transaction?  Specifically I'm wondering whether
this follows the same semantics as the s390 port.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-13 Thread Pedro Alves

On 05/05/14 08:54, Anshuman Khandual wrote:
 This patch enables get and set of transactional memory related register
 sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
 four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
 REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
 ELF core note types added previously in this regard.
 
   (1) NT_PPC_TM_SPR
   (2) NT_PPC_TM_CGPR
   (3) NT_PPC_TM_CFPR
   (4) NT_PPC_TM_CVMX

Sorry that I couldn't tell this from the code, but, what does the
kernel return when the ptracer requests these registers and the
program is not in a transaction?  Specifically I'm wondering whether
this follows the same semantics as the s390 port.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-13 Thread Pedro Alves

I wonder whether people are getting Roland's address from?

It's frequent that ptrace related patches end up CCed to
rol...@redhat.com, but, he's not been at Red Hat for a few years
now.  Roland, do you still want to be CCed on ptrace-related
issues?  If so, there's probably a script somewhere in the
kernel that needs updating.  If not, well, it'd be good
if it were updated anyway.  :-)

It's a little annoying, as Red Hat's servers outright reject
email sent from a @redhat.com address if one tries to send
an email that includes a CC/FROM to a user that no longer
exists in the @redhat.com domain.

-- 
Pedro Alves

On 05/05/14 08:54, Anshuman Khandual wrote:
 This patch enables get and set of transactional memory related register
 sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
 four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
 REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
 ELF core note types added previously in this regard.
 
   (1) NT_PPC_TM_SPR
   (2) NT_PPC_TM_CGPR
   (3) NT_PPC_TM_CFPR
   (4) NT_PPC_TM_CVMX
 
 Signed-off-by: Anshuman Khandual khand...@linux.vnet.ibm.com
 ---
  arch/powerpc/include/asm/switch_to.h |   8 +
  arch/powerpc/kernel/process.c|  24 ++
  arch/powerpc/kernel/ptrace.c | 683 
 +--
  3 files changed, 687 insertions(+), 28 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/switch_to.h 
 b/arch/powerpc/include/asm/switch_to.h
 index 0e83e7d..2737f46 100644
 --- a/arch/powerpc/include/asm/switch_to.h
 +++ b/arch/powerpc/include/asm/switch_to.h
 @@ -80,6 +80,14 @@ static inline void flush_spe_to_thread(struct task_struct 
 *t)
  }
  #endif
  
 +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 +extern void flush_tmregs_to_thread(struct task_struct *);
 +#else
 +static inline void flush_tmregs_to_thread(struct task_struct *t)
 +{
 +}
 +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 +
  static inline void clear_task_ebb(struct task_struct *t)
  {
  #ifdef CONFIG_PPC_BOOK3S_64
 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
 index 31d0215..e247898 100644
 --- a/arch/powerpc/kernel/process.c
 +++ b/arch/powerpc/kernel/process.c
 @@ -695,6 +695,30 @@ static inline void __switch_to_tm(struct task_struct 
 *prev)
   }
  }
  
 +void flush_tmregs_to_thread(struct task_struct *tsk)
 +{
 + /*
 +  * If task is not current, it should have been flushed
 +  * already to it's thread_struct during __switch_to().
 +  */
 + if (tsk != current)
 + return;
 +
 + preempt_disable();
 + if (tsk-thread.regs) {
 + /*
 +  * If we are still current, the TM state need to
 +  * be flushed to thread_struct as it will be still
 +  * present in the current cpu.
 +  */
 + if (MSR_TM_ACTIVE(tsk-thread.regs-msr)) {
 + __switch_to_tm(tsk);
 + tm_recheckpoint_new_task(tsk);
 + }
 + }
 + preempt_enable();
 +}
 +
  /*
   * This is called if we are on the way out to userspace and the
   * TIF_RESTORE_TM flag is set.  It checks if we need to reload
 diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
 index 2e3d2bf..92faded 100644
 --- a/arch/powerpc/kernel/ptrace.c
 +++ b/arch/powerpc/kernel/ptrace.c
 @@ -357,6 +357,17 @@ static int gpr_set(struct task_struct *target, const 
 struct user_regset *regset,
   return ret;
  }
  
 +/*
 + * When any transaction is active, thread_struct-transact_fp holds
 + * the current running value of all FPR registers and thread_struct-
 + * fp_state holds the last checkpointed FPR registers state for the
 + * current transaction.
 + *
 + * struct data {
 + *   u64 fpr[32];
 + *   u64 fpscr;
 + * };
 + */
  static int fpr_get(struct task_struct *target, const struct user_regset 
 *regset,
  unsigned int pos, unsigned int count,
  void *kbuf, void __user *ubuf)
 @@ -365,21 +376,41 @@ static int fpr_get(struct task_struct *target, const 
 struct user_regset *regset,
   u64 buf[33];
   int i;
  #endif
 - flush_fp_to_thread(target);
 + if (MSR_TM_ACTIVE(target-thread.regs-msr)) {
 + flush_fp_to_thread(target);
 + flush_altivec_to_thread(target);
 + flush_tmregs_to_thread(target);
 + } else {
 + flush_fp_to_thread(target);
 + }
  
  #ifdef CONFIG_VSX
   /* copy to local buffer then write that out */
 - for (i = 0; i  32 ; i++)
 - buf[i] = target-thread.TS_FPR(i);
 - buf[32] = target-thread.fp_state.fpscr;
 + if (MSR_TM_ACTIVE(target-thread.regs-msr)) {
 + for (i = 0; i  32 ; i++)
 + buf[i] = target-thread.TS_TRANS_FPR(i);
 + buf[32] = target-thread.transact_fp.fpscr;
 + } else {
 + for (i = 0; i  32 ; i++)
 + buf[i] = target

Re: [PATCH] ptrace: Fix PTRACE_GETREGSET/PTRACE_SETREGSET in code documentation

2014-05-13 Thread Pedro Alves

On 05/05/14 05:10, Anshuman Khandual wrote:
 On 05/01/2014 07:43 PM, Pedro Alves wrote:
 On 04/28/2014 12:00 PM, Anshuman Khandual wrote:
 The current documentation is bit misleading and does not explicitly
 specify that iov.len need to be initialized failing which kernel
 may just ignore the ptrace request and never read from/write into
 the user specified buffer. This patch fixes the documentation.

 Well, it kind of does, here:

 *  struct iovec iov = { buf, len};
 
 :) Thats not explicit enough.
 

 @@ -43,8 +43,12 @@
   *
   * ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, iov);
   *
 - * On the successful completion, iov.len will be updated by the kernel,
 - * specifying how much the kernel has written/read to/from the user's 
 iov.buf.
 + * A non-zero value upto the max size of data expected to be written/read 
 by the
 + * kernel in response to any NT_XXX_TYPE request type must be assigned to 
 iov.len
 + * before initiating the ptrace call. If iov.len is 0, then kernel will 
 neither
 + * read from or write into the user buffer specified. On successful 
 completion,
 + * iov.len will be updated by the kernel, specifying how much the kernel 
 has
 + * written/read to/from the user's iov.buf.

 I really appreciate that you're trying to make this clearer, but I
 find the new sentence very hard to read/reason.  :-/

 I suggest:

  * This interface usage is as follows:
 - *  struct iovec iov = { buf, len};
 + *  struct iovec iov = { buf, len };
  *
  *  ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, 
 iov);
  *
 - * On the successful completion, iov.len will be updated by the kernel,
 - * specifying how much the kernel has written/read to/from the user's 
 iov.buf.
 + * On entry, iov describes the buffer's address and length.  The buffer's
 + * length must be equal to or shorter than the size of the NT_XXX_TYPE 
 regset.
 + * On successful completion, iov.len is updated by the kernel, specifying 
 how
 + * much the kernel has written/read to/from the user's iov.buf.

 
 Yeah, sounds better. I may add If the length is zero, the kernel will 
 neither read
 from or write into the buffer

Well, I think that much should be obvious.  What's not obvious is
whether that is considered success or error (what is the return code?)
I suspect and expect success return if the regset type is known, and
error otherwise.  So that could be used as a way to probe for support
for a given regset without using stack or heap space, if it ever matters.
The kernel never reads/writes beyond iov.len, so better say that, and
then it automatically gets the 0 case handled too, right?

 I'm not sure I understood what you're saying correctly, though.  
 Specifically,
 I don't know whether the buffer's length must really be shorter than the
 size of the NT_XXX_TYPE regset.
 
 No, it does not have to. From the code snippet below (ptrace_regset function)
 the buffer length has to be multiple of regset-size for the given NT_XXX_TYPE
 upto the max regset size for the user to see any valid data.

Ah, I guess one could call it a bug.  If the passed in
len is bigger than the whole register set size, then there seems
to be no point in validating whether the length is multiple of
a single register's size.  That unnecessarily prevents coming up
with a register set in the future that has registers of
different sizes...

But given that that's how things are today, I suppose we should
document it...

 The problem what I
 faced was when you use any iovec structure with the length parameter 
 uninitialized,
 the kernel simply ignores and does not return anything.

Ah.  Well, saying does not return anything is quite confusing.  It does
return something -- -EINVAL.

 
 if (!regset || (kiov-iov_len % regset-size) != 0)
 return -EINVAL;

  

 The current documentation is bit misleading and does not explicitly
 specify that iov.len need to be initialized failing which kernel
 may just ignore the ptrace request and never read from/write into
 the user specified buffer.

 You're saying that if iov.len is larger than the NT_XXX_TYPE regset,
 then the kernel returns _success_, but actually doesn't fill the
 buffer?  That sounds like a bug to me.
 
 No, I am not saying that. The kernel takes care of that situation by capping
 the length to regset size of the NT_XXX_TYPE.
 
  kiov-iov_len = min(kiov-iov_len,
 (__kernel_size_t) (regset-n * regset-size));
 
 

OK, then this is what I suggest instead:

   * This interface usage is as follows:
 - *  struct iovec iov = { buf, len};
 + *  struct iovec iov = { buf, len };
   *
   *  ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, 
iov);
   *
   * On the successful completion, iov.len will be updated by the kernel,
 - * specifying how much the kernel has written/read to/from the user's iov.buf.
 + * On entry, iov describes the buffer's address and length.  The buffer's
 + * length

Re: [PATCH] ptrace: Fix PTRACE_GETREGSET/PTRACE_SETREGSET in code documentation

2014-05-01 Thread Pedro Alves

On 04/28/2014 12:00 PM, Anshuman Khandual wrote:
> The current documentation is bit misleading and does not explicitly
> specify that iov.len need to be initialized failing which kernel
> may just ignore the ptrace request and never read from/write into
> the user specified buffer. This patch fixes the documentation.

Well, it kind of does, here:

*  struct iovec iov = { buf, len};

> @@ -43,8 +43,12 @@
>   *
>   *   ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, );
>   *
> - * On the successful completion, iov.len will be updated by the kernel,
> - * specifying how much the kernel has written/read to/from the user's 
> iov.buf.
> + * A non-zero value upto the max size of data expected to be written/read by 
> the
> + * kernel in response to any NT_XXX_TYPE request type must be assigned to 
> iov.len
> + * before initiating the ptrace call. If iov.len is 0, then kernel will 
> neither
> + * read from or write into the user buffer specified. On successful 
> completion,
> + * iov.len will be updated by the kernel, specifying how much the kernel has
> + * written/read to/from the user's iov.buf.

I really appreciate that you're trying to make this clearer, but I
find the new sentence very hard to read/reason.  :-/

I suggest:

 * This interface usage is as follows:
- *  struct iovec iov = { buf, len};
+ *  struct iovec iov = { buf, len };
 *
 *  ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, );
 *
- * On the successful completion, iov.len will be updated by the kernel,
- * specifying how much the kernel has written/read to/from the user's iov.buf.
+ * On entry, iov describes the buffer's address and length.  The buffer's
+ * length must be equal to or shorter than the size of the NT_XXX_TYPE regset.
+ * On successful completion, iov.len is updated by the kernel, specifying how
+ * much the kernel has written/read to/from the user's iov.buf.

I'm not sure I understood what you're saying correctly, though.  Specifically,
I don't know whether the buffer's length must really be shorter than the
size of the NT_XXX_TYPE regset.

> The current documentation is bit misleading and does not explicitly
> specify that iov.len need to be initialized failing which kernel
> may just ignore the ptrace request and never read from/write into
> the user specified buffer.

You're saying that if iov.len is larger than the NT_XXX_TYPE regset,
then the kernel returns _success_, but actually doesn't fill the
buffer?  That sounds like a bug to me.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] powerpc, ptrace: Add new ptrace request macros for transactional memory

2014-05-01 Thread Pedro Alves

On 04/28/2014 11:30 AM, Anshuman Khandual wrote:
> On 04/26/2014 05:12 AM, Pedro Alves wrote:
>> On 04/02/2014 08:02 AM, Anshuman Khandual wrote:
>>> This patch adds following new sets of ptrace request macros for 
>>> transactional
>>> memory expanding the existing ptrace ABI on PowerPC.
>>>
>>> /* TM special purpose registers */
>>> PTRACE_GETTM_SPRREGS
>>> PTRACE_SETTM_SPRREGS
>>>
>>> /* TM checkpointed GPR registers */
>>> PTRACE_GETTM_CGPRREGS
>>> PTRACE_SETTM_CGPRREGS
>>>
>>> /* TM checkpointed FPR registers */
>>> PTRACE_GETTM_CFPRREGS
>>> PTRACE_SETTM_CFPRREGS
>>>
>>> /* TM checkpointed VMX registers */
>>> PTRACE_GETTM_CVMXREGS
>>> PTRACE_SETTM_CVMXREGS
>>
>> Urgh, we're _still_ adding specialized register specific calls?
>> Why aren't these exported as new register sets, accessible through
>> PTRACE_GETREGSET /  PTRACE_SETREGSET?  That's supposed to be the
>> Modern Way to do things.
> 
> All these new register sets can be accessed through PTRACE_GETREGSET
> /SETREGSET requests with the new NT_PPC_* core note types added in the
> previous patch. PowerPC already has some register specific ptrace
> requests, so thought of adding some new requests for transactional
> memory purpose. But yes these are redundant and can be dropped.

Thank you.  I assume you guys will be working on gdb support for this,
and I'm really hoping that transaction support in the kernel is being
made uniform across archs.  E.g., on s390, PTRACE_GETREGSET returns
ENODATA if the ptracee was interrupted outside transactions.

  https://sourceware.org/ml/gdb-patches/2013-06/msg00273.html

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] powerpc, ptrace: Add new ptrace request macros for transactional memory

2014-05-01 Thread Pedro Alves

On 04/28/2014 11:30 AM, Anshuman Khandual wrote:
 On 04/26/2014 05:12 AM, Pedro Alves wrote:
 On 04/02/2014 08:02 AM, Anshuman Khandual wrote:
 This patch adds following new sets of ptrace request macros for 
 transactional
 memory expanding the existing ptrace ABI on PowerPC.

 /* TM special purpose registers */
 PTRACE_GETTM_SPRREGS
 PTRACE_SETTM_SPRREGS

 /* TM checkpointed GPR registers */
 PTRACE_GETTM_CGPRREGS
 PTRACE_SETTM_CGPRREGS

 /* TM checkpointed FPR registers */
 PTRACE_GETTM_CFPRREGS
 PTRACE_SETTM_CFPRREGS

 /* TM checkpointed VMX registers */
 PTRACE_GETTM_CVMXREGS
 PTRACE_SETTM_CVMXREGS

 Urgh, we're _still_ adding specialized register specific calls?
 Why aren't these exported as new register sets, accessible through
 PTRACE_GETREGSET /  PTRACE_SETREGSET?  That's supposed to be the
 Modern Way to do things.
 
 All these new register sets can be accessed through PTRACE_GETREGSET
 /SETREGSET requests with the new NT_PPC_* core note types added in the
 previous patch. PowerPC already has some register specific ptrace
 requests, so thought of adding some new requests for transactional
 memory purpose. But yes these are redundant and can be dropped.

Thank you.  I assume you guys will be working on gdb support for this,
and I'm really hoping that transaction support in the kernel is being
made uniform across archs.  E.g., on s390, PTRACE_GETREGSET returns
ENODATA if the ptracee was interrupted outside transactions.

  https://sourceware.org/ml/gdb-patches/2013-06/msg00273.html

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ptrace: Fix PTRACE_GETREGSET/PTRACE_SETREGSET in code documentation

2014-05-01 Thread Pedro Alves

On 04/28/2014 12:00 PM, Anshuman Khandual wrote:
 The current documentation is bit misleading and does not explicitly
 specify that iov.len need to be initialized failing which kernel
 may just ignore the ptrace request and never read from/write into
 the user specified buffer. This patch fixes the documentation.

Well, it kind of does, here:

*  struct iovec iov = { buf, len};

 @@ -43,8 +43,12 @@
   *
   *   ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, iov);
   *
 - * On the successful completion, iov.len will be updated by the kernel,
 - * specifying how much the kernel has written/read to/from the user's 
 iov.buf.
 + * A non-zero value upto the max size of data expected to be written/read by 
 the
 + * kernel in response to any NT_XXX_TYPE request type must be assigned to 
 iov.len
 + * before initiating the ptrace call. If iov.len is 0, then kernel will 
 neither
 + * read from or write into the user buffer specified. On successful 
 completion,
 + * iov.len will be updated by the kernel, specifying how much the kernel has
 + * written/read to/from the user's iov.buf.

I really appreciate that you're trying to make this clearer, but I
find the new sentence very hard to read/reason.  :-/

I suggest:

 * This interface usage is as follows:
- *  struct iovec iov = { buf, len};
+ *  struct iovec iov = { buf, len };
 *
 *  ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, iov);
 *
- * On the successful completion, iov.len will be updated by the kernel,
- * specifying how much the kernel has written/read to/from the user's iov.buf.
+ * On entry, iov describes the buffer's address and length.  The buffer's
+ * length must be equal to or shorter than the size of the NT_XXX_TYPE regset.
+ * On successful completion, iov.len is updated by the kernel, specifying how
+ * much the kernel has written/read to/from the user's iov.buf.

I'm not sure I understood what you're saying correctly, though.  Specifically,
I don't know whether the buffer's length must really be shorter than the
size of the NT_XXX_TYPE regset.

 The current documentation is bit misleading and does not explicitly
 specify that iov.len need to be initialized failing which kernel
 may just ignore the ptrace request and never read from/write into
 the user specified buffer.

You're saying that if iov.len is larger than the NT_XXX_TYPE regset,
then the kernel returns _success_, but actually doesn't fill the
buffer?  That sounds like a bug to me.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix get ERESTARTSYS with m32 in x86_64 when debug by GDB

2014-04-30 Thread Pedro Alves

On 04/30/2014 02:35 PM, Mark Kettenis wrote:
>> Date: Tue, 29 Apr 2014 22:10:15 -0700
>> From: "H. Peter Anvin" 
>>
>> On 04/29/2014 10:08 PM, Andrew Pinski wrote:
>>>
>>> restoring the values is hard since even the ptrace interface does not
>>> allow for that.
>>>
>>
>> So that begs the ultimate question, which is: given the fact that there
>> is *state missing* from the state vector (this is the core of the
>> problem), is there a way we can add that state so that gdb will be able
>> to save and restore it?
> 
> Carrying around additional state in GDB is complicated; I'd rather
> avoid it.

Agreed very much.

> arch/x86/kernel/ptrace.c:putreg32() has this bit of code:
> 
> case offsetof(struct user32, regs.orig_eax):
> /*
>  * A 32-bit debugger setting orig_eax means to restore
>  * the state of the task restarting a 32-bit syscall.
>  * Make sure we interpret the -ERESTART* codes correctly
>  * in case the task is not actually still sitting at the
>  * exit from a 32-bit syscall with TS_COMPAT still set.
>  */
> regs->orig_ax = value;
> if (syscall_get_nr(child, regs) >= 0)
> task_thread_info(child)->status |= TS_COMPAT;
> break;
> 
> which gets used for 32-bit compat ptrace(2).  Perhaps the same logic
> should be added to putreg() if the child is a 32-bit process?

Sounds like the right fix to me.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix get ERESTARTSYS with m32 in x86_64 when debug by GDB

2014-04-30 Thread Pedro Alves

On 04/30/2014 02:35 PM, Mark Kettenis wrote:
 Date: Tue, 29 Apr 2014 22:10:15 -0700
 From: H. Peter Anvin h...@zytor.com

 On 04/29/2014 10:08 PM, Andrew Pinski wrote:

 restoring the values is hard since even the ptrace interface does not
 allow for that.

 So that begs the ultimate question, which is: given the fact that there
 is *state missing* from the state vector (this is the core of the
 problem), is there a way we can add that state so that gdb will be able
 to save and restore it?

 Carrying around additional state in GDB is complicated; I'd rather
 avoid it.

Agreed very much.

 arch/x86/kernel/ptrace.c:putreg32() has this bit of code:

 case offsetof(struct user32, regs.orig_eax):
 /*
  * A 32-bit debugger setting orig_eax means to restore
  * the state of the task restarting a 32-bit syscall.
  * Make sure we interpret the -ERESTART* codes correctly
  * in case the task is not actually still sitting at the
  * exit from a 32-bit syscall with TS_COMPAT still set.
  */
 regs-orig_ax = value;
 if (syscall_get_nr(child, regs) = 0)
 task_thread_info(child)-status |= TS_COMPAT;
 break;

 which gets used for 32-bit compat ptrace(2).  Perhaps the same logic
 should be added to putreg() if the child is a 32-bit process?

Sounds like the right fix to me.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 25/28] nios2: ptrace support

2014-04-25 Thread Pedro Alves

Does this support PTRACE_GETREGSET / PTRACE_SETREGSET ?

IMO, the kernel shouldn't accept ports without those anymore.

And IMHO, we shouldn't even allow new ports having PTRACE_GETREGS.

http://sourceware.org/ml/archer/2010-q3/msg00193.html

-- 
Pedro Alves
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] powerpc, ptrace: Add new ptrace request macros for transactional memory

2014-04-25 Thread Pedro Alves

On 04/02/2014 08:02 AM, Anshuman Khandual wrote:
> This patch adds following new sets of ptrace request macros for transactional
> memory expanding the existing ptrace ABI on PowerPC.
> 
>   /* TM special purpose registers */
>   PTRACE_GETTM_SPRREGS
>   PTRACE_SETTM_SPRREGS
> 
>   /* TM checkpointed GPR registers */
>   PTRACE_GETTM_CGPRREGS
>   PTRACE_SETTM_CGPRREGS
> 
>   /* TM checkpointed FPR registers */
>   PTRACE_GETTM_CFPRREGS
>   PTRACE_SETTM_CFPRREGS
> 
>   /* TM checkpointed VMX registers */
>   PTRACE_GETTM_CVMXREGS
>   PTRACE_SETTM_CVMXREGS

Urgh, we're _still_ adding specialized register specific calls?
Why aren't these exported as new register sets, accessible through
PTRACE_GETREGSET /  PTRACE_SETREGSET?  That's supposed to be the
Modern Way to do things.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] powerpc, ptrace: Add new ptrace request macros for transactional memory

2014-04-25 Thread Pedro Alves

On 04/02/2014 08:02 AM, Anshuman Khandual wrote:
 This patch adds following new sets of ptrace request macros for transactional
 memory expanding the existing ptrace ABI on PowerPC.
 
   /* TM special purpose registers */
   PTRACE_GETTM_SPRREGS
   PTRACE_SETTM_SPRREGS
 
   /* TM checkpointed GPR registers */
   PTRACE_GETTM_CGPRREGS
   PTRACE_SETTM_CGPRREGS
 
   /* TM checkpointed FPR registers */
   PTRACE_GETTM_CFPRREGS
   PTRACE_SETTM_CFPRREGS
 
   /* TM checkpointed VMX registers */
   PTRACE_GETTM_CVMXREGS
   PTRACE_SETTM_CVMXREGS

Urgh, we're _still_ adding specialized register specific calls?
Why aren't these exported as new register sets, accessible through
PTRACE_GETREGSET /  PTRACE_SETREGSET?  That's supposed to be the
Modern Way to do things.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 25/28] nios2: ptrace support

2014-04-25 Thread Pedro Alves

Does this support PTRACE_GETREGSET / PTRACE_SETREGSET ?

IMO, the kernel shouldn't accept ports without those anymore.

And IMHO, we shouldn't even allow new ports having PTRACE_GETREGS.

http://sourceware.org/ml/archer/2010-q3/msg00193.html

-- 
Pedro Alves
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

2014-01-09 Thread Pedro Alves

On 01/07/2014 03:30 PM, Oleg Nesterov wrote:
> 
> If we add the new API, perhaps we should change ptrace_resume ?
> I mean,
> 
>   --- x/kernel/ptrace.c
>   +++ x/kernel/ptrace.c
>   @@ -723,7 +723,9 @@ static int ptrace_resume(struct task_str
>   if (!valid_signal(data))
>   return -EIO;
>
>   -   if (request == PTRACE_SYSCALL)
>   +   if (request == PTRACE_SYSCALL ||
>   +   ptrace_event_enabled(PTRACE_EVENT_SYSCALL_ENTER) ||
>   +   ptrace_event_enabled(PTRACE_EVENT_SYSCALL_EXIT))
>   set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
>   else
>   clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
> 
> 
> This way PTRACE_O_SYSCALL_* will work like other ptrace options which
> ask to report an event.

+10^6.  With PTRACE_SYSCALL/sysgood, we don't have a way to trace
syscalls when single-stepping, which isn't much of a problem for
strace, but of course is for GDB.  That is one of the things the
new API should definitely sort out.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/PATCH] Implement new PTRACE_EVENT_SYSCALL_{ENTER,EXIT}

2014-01-09 Thread Pedro Alves

On 01/07/2014 03:30 PM, Oleg Nesterov wrote:
 
 If we add the new API, perhaps we should change ptrace_resume ?
 I mean,
 
   --- x/kernel/ptrace.c
   +++ x/kernel/ptrace.c
   @@ -723,7 +723,9 @@ static int ptrace_resume(struct task_str
   if (!valid_signal(data))
   return -EIO;

   -   if (request == PTRACE_SYSCALL)
   +   if (request == PTRACE_SYSCALL ||
   +   ptrace_event_enabled(PTRACE_EVENT_SYSCALL_ENTER) ||
   +   ptrace_event_enabled(PTRACE_EVENT_SYSCALL_EXIT))
   set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
   else
   clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
 
 
 This way PTRACE_O_SYSCALL_* will work like other ptrace options which
 ask to report an event.

+10^6.  With PTRACE_SYSCALL/sysgood, we don't have a way to trace
syscalls when single-stepping, which isn't much of a problem for
strace, but of course is for GDB.  That is one of the things the
new API should definitely sort out.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ptrace: make PTRACE_DETACH work on non-stopped tracees.

2013-06-19 Thread Pedro Alves

On 06/19/2013 05:09 PM, Jan Kratochvil wrote:
> On Wed, 19 Jun 2013 17:15:36 +0200, Denys Vlasenko wrote:
>> CCing Jan to hear his comments from gdb side.

PTRACE_DETACH takes a signal number in the data parameter.
What happens to if the tracer passes a non-zero signal?

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ptrace: make PTRACE_DETACH work on non-stopped tracees.

2013-06-19 Thread Pedro Alves

On 06/19/2013 05:09 PM, Jan Kratochvil wrote:
 On Wed, 19 Jun 2013 17:15:36 +0200, Denys Vlasenko wrote:
 CCing Jan to hear his comments from gdb side.

PTRACE_DETACH takes a signal number in the data parameter.
What happens to if the tracer passes a non-zero signal?

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ptrace: add ability to retrieve signals without removing them from a queue

2013-02-15 Thread Pedro Alves

Forgot to reply to this bit:

On 02/15/2013 07:43 PM, Oleg Nesterov wrote:
>> We'd miss the poke
>> > variant, but that looks like something that could be always be added
>> > later.
> Yes. _POKE_ or _QUEUE_ or _DEQUEUE_, we can add more features if user-
> space wants them.

In general, IMO, I agree with Roland at https://lkml.org/lkml/2002/12/20/84
in that it's good to have setters for completeness, so that you can
change all the state via ptrace that you can read via ptrace.

But I'm not doing any of this work, hence my "could always be
added later" comment instead of actually requesting it.  But if
we had it, we could make e.g., gdb inspect the signal queues,
and then be able to tweak a realtime signal before it is
delivered.

-- 
Pedro Alves
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ptrace: add ability to retrieve signals without removing them from a queue

2013-02-15 Thread Pedro Alves

On 02/15/2013 07:43 PM, Oleg Nesterov wrote:
> On 02/15, Pedro Alves wrote:
>>
>> Not sure I'm reading the patch right, but it looks like GDB would
>> be able to use this as alternative to PTRACE_GET_SIGINFO variant
> 
> No, it is different. PTRACE_GETSIGINFO reports the siginfo for the signal
> which was already dequeued (ignoring the fact ->last_siginfo != NULL doesn't
> necessarily mean we report a signal), while this patch allows to look at
> the pending signals which were not reported yet.

Ah, I had assumed queue position 0 would be the dequeued signal.

>> I wouldn't mind if this was added unconditionally
>> instead of wrapped on CONFIG_CHECKPOINT_RESTORE.
> 
> I agree. If you think gdb can use this new feature,  CONFIG_ can go away.

I think so -- we can add a gdb command to dump the list of
pending signals, which I think is something that'd be useful.

> 
>> We'd miss the poke
>> variant, but that looks like something that could be always be added
>> later.
> 
> Yes. _POKE_ or _QUEUE_ or _DEQUEUE_, we can add more features if user-
> space wants them.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ptrace: add ability to retrieve signals without removing them from a queue

2013-02-15 Thread Pedro Alves

On 02/13/2013 03:16 PM, Andrey Vagin wrote:
> This patch adds a new ptrace request PTRACE_PEEKSIGINFO.
> 
> This request is used to retrieve information about a signal with the
> specified sequence number. A siginfo_t structure is copied from the child
> to location data in the parent.
> 
> The low 16 bits of addr contains a sequence number of signal in a queue.
> All other bits of addr is used for flags. Currently here is only one
> flag PTRACE_PEEK_SHARED for dumping signals from process-wide shared
> queue. If this flag is not set, a signal is read from a per-thread
> queue.  A result siginfo contains a kernel part of si_code which usually
> striped, but it's required for queuing the same siginfo back during
> restore of pending signals.
> 
> If a signal with the specified sequence number doesn't exist, ptrace
> returns ENOENT.
> 
> This functionality is required for checkpointing pending signals.
> 
> The prototype of this code was developed by Oleg Nesterov.

Not sure I'm reading the patch right, but it looks like GDB would
be able to use this as alternative to PTRACE_GET_SIGINFO variant
that returns the siginfo_t object in the architecture/bitness of
the tracee, rather than the architecture of the kernel, right?
So it no longer would need to try and replicate the kernel's
siginfo conversions.  I wouldn't mind if this was added unconditionally
instead of wrapped on CONFIG_CHECKPOINT_RESTORE.  We'd miss the poke
variant, but that looks like something that could be always be added
later.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ptrace: add ability to retrieve signals without removing them from a queue

2013-02-15 Thread Pedro Alves

On 02/13/2013 03:16 PM, Andrey Vagin wrote:
 This patch adds a new ptrace request PTRACE_PEEKSIGINFO.
 
 This request is used to retrieve information about a signal with the
 specified sequence number. A siginfo_t structure is copied from the child
 to location data in the parent.
 
 The low 16 bits of addr contains a sequence number of signal in a queue.
 All other bits of addr is used for flags. Currently here is only one
 flag PTRACE_PEEK_SHARED for dumping signals from process-wide shared
 queue. If this flag is not set, a signal is read from a per-thread
 queue.  A result siginfo contains a kernel part of si_code which usually
 striped, but it's required for queuing the same siginfo back during
 restore of pending signals.
 
 If a signal with the specified sequence number doesn't exist, ptrace
 returns ENOENT.
 
 This functionality is required for checkpointing pending signals.
 
 The prototype of this code was developed by Oleg Nesterov.

Not sure I'm reading the patch right, but it looks like GDB would
be able to use this as alternative to PTRACE_GET_SIGINFO variant
that returns the siginfo_t object in the architecture/bitness of
the tracee, rather than the architecture of the kernel, right?
So it no longer would need to try and replicate the kernel's
siginfo conversions.  I wouldn't mind if this was added unconditionally
instead of wrapped on CONFIG_CHECKPOINT_RESTORE.  We'd miss the poke
variant, but that looks like something that could be always be added
later.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ptrace: add ability to retrieve signals without removing them from a queue

2013-02-15 Thread Pedro Alves

On 02/15/2013 07:43 PM, Oleg Nesterov wrote:
 On 02/15, Pedro Alves wrote:

 Not sure I'm reading the patch right, but it looks like GDB would
 be able to use this as alternative to PTRACE_GET_SIGINFO variant
 
 No, it is different. PTRACE_GETSIGINFO reports the siginfo for the signal
 which was already dequeued (ignoring the fact -last_siginfo != NULL doesn't
 necessarily mean we report a signal), while this patch allows to look at
 the pending signals which were not reported yet.

Ah, I had assumed queue position 0 would be the dequeued signal.

 I wouldn't mind if this was added unconditionally
 instead of wrapped on CONFIG_CHECKPOINT_RESTORE.
 
 I agree. If you think gdb can use this new feature,  CONFIG_ can go away.

I think so -- we can add a gdb command to dump the list of
pending signals, which I think is something that'd be useful.

 
 We'd miss the poke
 variant, but that looks like something that could be always be added
 later.
 
 Yes. _POKE_ or _QUEUE_ or _DEQUEUE_, we can add more features if user-
 space wants them.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ptrace: add ability to retrieve signals without removing them from a queue

2013-02-15 Thread Pedro Alves

Forgot to reply to this bit:

On 02/15/2013 07:43 PM, Oleg Nesterov wrote:
 We'd miss the poke
  variant, but that looks like something that could be always be added
  later.
 Yes. _POKE_ or _QUEUE_ or _DEQUEUE_, we can add more features if user-
 space wants them.

In general, IMO, I agree with Roland at https://lkml.org/lkml/2002/12/20/84
in that it's good to have setters for completeness, so that you can
change all the state via ptrace that you can read via ptrace.

But I'm not doing any of this work, hence my could always be
added later comment instead of actually requesting it.  But if
we had it, we could make e.g., gdb inspect the signal queues,
and then be able to tweak a realtime signal before it is
delivered.

-- 
Pedro Alves
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Redefinition of struct in6_addr in and

2013-01-18 Thread Pedro Alves

On 01/18/2013 02:24 PM, YOSHIFUJI Hideaki wrote:

>>>> It's simple enough to move all of the __GLIBC__ uses into libc-compat.h,
>>>> then you control userspace libc coordination from one file.
>>>
>>> How about just deciding on a single macro/symbol both the
>>> kernel and libc (any libc that needs this) define?  Something
>>> like both the kernel and userland doing:
>>>
>>> #ifndef __IPV6_BITS_DEFINED
>>> #define __IPV6_BITS_DEFINED
>>> ...
>>> define in6_addr, sockaddr_in6, ipv6_mreq, whatnot
>>> #endif
> 
> Hmm, how should we handle future structs/enums then?
> For example, if I want to have in6_flowlabel_req{} defined in
> glibc, what should we do?

Include the glibc header first?  Or is this some other
use case?

The point wasn't that you'd have only one macro for all
structs/enums (you could split into __IPV6_IN6_ADDR_DEFINED,
__IPV6_SOCKADDR_IN6_DEFINED, etc.) but to have the kernel
and libc agree on guard macros, instead of having the kernel
do #ifdef __GLIBC__ and glibc doing #ifdef _NETINET_IN_H.

But as Carlos says, the devil is in the details, and
I sure am not qualified on the details here.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Redefinition of struct in6_addr in and

2013-01-18 Thread Pedro Alves

On 01/18/2013 04:22 AM, Carlos O'Donell wrote:
> On Thu, Jan 17, 2013 at 11:20 PM, Mike Frysinger  wrote:
>> On Wednesday 16 January 2013 22:15:38 David Miller wrote:
>>> From: Carlos O'Donell 
>>> Date: Wed, 16 Jan 2013 21:15:03 -0500
>>>
>>>> +/* If a glibc-based userspace has already included in.h, then we will
>>>> not + * define in6_addr (nor the defines), sockaddr_in6, or ipv6_mreq.
>>>> The + * ABI used by the kernel and by glibc match exactly. Neither the
>>>> kernel + * nor glibc should break this ABI without coordination.
>>>> + */
>>>> +#ifndef _NETINET_IN_H
>>>> +
>>>
>>> I think we should shoot for a non-glibc-centric solution.
>>>
>>> I can't imagine that other libc's won't have the same exact problem
>>> with their netinet/in.h conflicting with the kernel's, redefining
>>> structures like in6_addr, that we'd want to provide a protection
>>> scheme for here as well.
>>
>> yes, the kernel's use of __GLIBC__ in exported headers has already caused
>> problems in the past.  fortunately, it's been reduced down to just one case
>> now (stat.h).  let's not balloon it back up.
>> -mike
> 
> I also see coda.h has grown a __GLIBC__ usage.
> 
> In the next revision of the patch I created a single libc-compat.h header
> which encompasses the logic for any libc that wants to coordinate with
> the kernel headers.


> It's simple enough to move all of the __GLIBC__ uses into libc-compat.h,
> then you control userspace libc coordination from one file.

How about just deciding on a single macro/symbol both the
kernel and libc (any libc that needs this) define?  Something
like both the kernel and userland doing:

#ifndef __IPV6_BITS_DEFINED
#define __IPV6_BITS_DEFINED
...
define in6_addr, sockaddr_in6, ipv6_mreq, whatnot
#endif

So whichever the application includes first, wins.
Too naive?  I didn't see this option being discarded, so
not sure it was considered.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Redefinition of struct in6_addr in netinet/in.h and linux/in6.h

2013-01-18 Thread Pedro Alves

On 01/18/2013 04:22 AM, Carlos O'Donell wrote:
 On Thu, Jan 17, 2013 at 11:20 PM, Mike Frysinger vap...@gentoo.org wrote:
 On Wednesday 16 January 2013 22:15:38 David Miller wrote:
 From: Carlos O'Donell car...@systemhalted.org
 Date: Wed, 16 Jan 2013 21:15:03 -0500

 +/* If a glibc-based userspace has already included in.h, then we will
 not + * define in6_addr (nor the defines), sockaddr_in6, or ipv6_mreq.
 The + * ABI used by the kernel and by glibc match exactly. Neither the
 kernel + * nor glibc should break this ABI without coordination.
 + */
 +#ifndef _NETINET_IN_H
 +

 I think we should shoot for a non-glibc-centric solution.

 I can't imagine that other libc's won't have the same exact problem
 with their netinet/in.h conflicting with the kernel's, redefining
 structures like in6_addr, that we'd want to provide a protection
 scheme for here as well.

 yes, the kernel's use of __GLIBC__ in exported headers has already caused
 problems in the past.  fortunately, it's been reduced down to just one case
 now (stat.h).  let's not balloon it back up.
 -mike

 I also see coda.h has grown a __GLIBC__ usage.

 In the next revision of the patch I created a single libc-compat.h header
 which encompasses the logic for any libc that wants to coordinate with
 the kernel headers.

 It's simple enough to move all of the __GLIBC__ uses into libc-compat.h,
 then you control userspace libc coordination from one file.

How about just deciding on a single macro/symbol both the
kernel and libc (any libc that needs this) define?  Something
like both the kernel and userland doing:

#ifndef __IPV6_BITS_DEFINED
#define __IPV6_BITS_DEFINED
...
define in6_addr, sockaddr_in6, ipv6_mreq, whatnot
#endif

So whichever the application includes first, wins.
Too naive?  I didn't see this option being discarded, so
not sure it was considered.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Redefinition of struct in6_addr in netinet/in.h and linux/in6.h

2013-01-18 Thread Pedro Alves

On 01/18/2013 02:24 PM, YOSHIFUJI Hideaki wrote:

 It's simple enough to move all of the __GLIBC__ uses into libc-compat.h,
 then you control userspace libc coordination from one file.

 How about just deciding on a single macro/symbol both the
 kernel and libc (any libc that needs this) define?  Something
 like both the kernel and userland doing:

 #ifndef __IPV6_BITS_DEFINED
 #define __IPV6_BITS_DEFINED
 ...
 define in6_addr, sockaddr_in6, ipv6_mreq, whatnot
 #endif
 
 Hmm, how should we handle future structs/enums then?
 For example, if I want to have in6_flowlabel_req{} defined in
 glibc, what should we do?

Include the glibc header first?  Or is this some other
use case?

The point wasn't that you'd have only one macro for all
structs/enums (you could split into __IPV6_IN6_ADDR_DEFINED,
__IPV6_SOCKADDR_IN6_DEFINED, etc.) but to have the kernel
and libc agree on guard macros, instead of having the kernel
do #ifdef __GLIBC__ and glibc doing #ifdef _NETINET_IN_H.

But as Carlos says, the devil is in the details, and
I sure am not qualified on the details here.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PTRACE_SYSCALL && vsyscall (Was: arch_check_bp_in_kernelspace: fix the range check)

2013-01-08 Thread Pedro Alves

On 12/04/2012 05:59 PM, Oleg Nesterov wrote:

> But If we want to allow to trace vsyscall's, hw bp doesn't look very
> nice imo. HBP_NUM = 4 and you need to setup 3 bp's to trace them all.

Irrespective of the whole syscall tracing issue, allowing HW bkpts in
the vsyscall just seems like a bug fix to me.

> That is why I think PTRACE_SYSCALL should "simply work" somehow. And
> so far I think that "just report syscall_exit with orig_ax = -1" is
> the best (and simple) solution.

If you report exit alone, you'll confuse current GDB into mistaking
it for an enter, and all following enter/exits end up swapped/confused.
GDB assumes trap/sysgood alternates between enter/exit, and always
starts with enter.

(Mildly related, GDB has an old comment in the code (linux-nat.c)
that says:

 "However, most architectures can't handle a syscall
 being traced on the way out if it wasn't traced on
 the way in."

I've no clue if that's still true nowadays.)

> OK. We can do more. We can report both syscall_enter/exit and we can
> change orig_ax/ax temporary to "fool" the tracer, so that everything
> will look as a "normal" syscall. Like vsyscall_seccomp() does.
> 
> But this needs much more changes.

I'd just like to add, that if any new syscall related option is
to be added, can we please just go all the way and add
PTRACE_EVENT_SYSCALL_ENTER|PTRACE_EVENT_SYSCALL_EXIT instead?

http://sourceware.org/gdb/wiki/LinuxKernelWishList

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PTRACE_SYSCALL vsyscall (Was: arch_check_bp_in_kernelspace: fix the range check)

2013-01-08 Thread Pedro Alves

On 12/04/2012 05:59 PM, Oleg Nesterov wrote:

 But If we want to allow to trace vsyscall's, hw bp doesn't look very
 nice imo. HBP_NUM = 4 and you need to setup 3 bp's to trace them all.

Irrespective of the whole syscall tracing issue, allowing HW bkpts in
the vsyscall just seems like a bug fix to me.

 That is why I think PTRACE_SYSCALL should simply work somehow. And
 so far I think that just report syscall_exit with orig_ax = -1 is
 the best (and simple) solution.

If you report exit alone, you'll confuse current GDB into mistaking
it for an enter, and all following enter/exits end up swapped/confused.
GDB assumes trap/sysgood alternates between enter/exit, and always
starts with enter.

(Mildly related, GDB has an old comment in the code (linux-nat.c)
that says:

 However, most architectures can't handle a syscall
 being traced on the way out if it wasn't traced on
 the way in.

I've no clue if that's still true nowadays.)

 OK. We can do more. We can report both syscall_enter/exit and we can
 change orig_ax/ax temporary to fool the tracer, so that everything
 will look as a normal syscall. Like vsyscall_seccomp() does.
 
 But this needs much more changes.

I'd just like to add, that if any new syscall related option is
to be added, can we please just go all the way and add
PTRACE_EVENT_SYSCALL_ENTER|PTRACE_EVENT_SYSCALL_EXIT instead?

http://sourceware.org/gdb/wiki/LinuxKernelWishList

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PT_EXITKILL (Was: pdeath_signal)

2012-11-08 Thread Pedro Alves

On 11/08/2012 12:44 PM, Oleg Nesterov wrote:
> On 11/08, Pedro Alves wrote:

>> If this isn't inherited by the ptrace child's children, a fork child can
>> end up detached if the tracer dies before it had a chance of setting
>> the PTRACE_O_EXITKILL on the new auto-attached child.
> 
> It is copied like the other options.

Oh, you're right.  I got confused - GDB has code to always set options
on the fork children after PTRACE_EVENT_(V)FORK.   Dunno where that came from.

>> Which sounds like another argument for PTRACE_O_INHERIT, as in:
>> http://sourceware.org/ml/archer/2011-q1/msg00026.html
> 
>   The point of PTRACE_O_INHERIT would be to attach newly-created threads 
> and
>   children without causing an event stop and the attendant overhead.
> 
> this is another thing, I guess.

Yes, yes.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PT_EXITKILL (Was: pdeath_signal)

2012-11-08 Thread Pedro Alves

On 11/07/2012 03:09 PM, Oleg Nesterov wrote:

>> > > > > What I would IDEALLY like to have is a call, probably a ptrace 
>> > > > > option,
>> > > > > where the parent can request: "If I am ever to terminate or be 
>> > > > > killed,
>> > > > > then my ptraced son MUST die as well".
>> > >
>> > > Perhaps this makes sense...
>> > >
>> > > Chris, iirc you also suggested something like this? And the patch is
>> > > trivial.

(...)

> OK. Please see the untested/uncompiled (but trivial) patch below
> 
>   - it adds PTRACE_O_EXITKILL. A better name?
> 
>   - A better numeric value? Note that the new option is not equal to
> the last-ptrace-option << 1. Because currently all options have
> the event, and the new one starts the eventless group. 1 << 16
> means we have the room for 8 more events.
> 
>   - it needs the convincing changelog for akpm


If this isn't inherited by the ptrace child's children, a fork child can
end up detached if the tracer dies before it had a chance of setting
the PTRACE_O_EXITKILL on the new auto-attached child.

Which sounds like another argument for PTRACE_O_INHERIT, as in:
http://sourceware.org/ml/archer/2011-q1/msg00026.html

(it sounds like you need to use PTRACE_SEIZE+options too to plug
the race between PTRACE_ME/PTRACE_ATTACH and
setting PTRACE_SETOPTIONS).

(For completeness, Windows' age old equivalent,
DebugSetProcessKillOnExit, it a tracer option, not tracee option, though
that's not as flexible.)

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PT_EXITKILL (Was: pdeath_signal)

2012-11-08 Thread Pedro Alves

On 11/07/2012 03:09 PM, Oleg Nesterov wrote:

 What I would IDEALLY like to have is a call, probably a ptrace 
 option,
 where the parent can request: If I am ever to terminate or be 
 killed,
 then my ptraced son MUST die as well.
  
   Perhaps this makes sense...
  
   Chris, iirc you also suggested something like this? And the patch is
   trivial.

(...)

 OK. Please see the untested/uncompiled (but trivial) patch below
 
   - it adds PTRACE_O_EXITKILL. A better name?
 
   - A better numeric value? Note that the new option is not equal to
 the last-ptrace-option  1. Because currently all options have
 the event, and the new one starts the eventless group. 1  16
 means we have the room for 8 more events.
 
   - it needs the convincing changelog for akpm


If this isn't inherited by the ptrace child's children, a fork child can
end up detached if the tracer dies before it had a chance of setting
the PTRACE_O_EXITKILL on the new auto-attached child.

Which sounds like another argument for PTRACE_O_INHERIT, as in:
http://sourceware.org/ml/archer/2011-q1/msg00026.html

(it sounds like you need to use PTRACE_SEIZE+options too to plug
the race between PTRACE_ME/PTRACE_ATTACH and
setting PTRACE_SETOPTIONS).

(For completeness, Windows' age old equivalent,
DebugSetProcessKillOnExit, it a tracer option, not tracee option, though
that's not as flexible.)

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PT_EXITKILL (Was: pdeath_signal)

2012-11-08 Thread Pedro Alves

On 11/08/2012 12:44 PM, Oleg Nesterov wrote:
 On 11/08, Pedro Alves wrote:

 If this isn't inherited by the ptrace child's children, a fork child can
 end up detached if the tracer dies before it had a chance of setting
 the PTRACE_O_EXITKILL on the new auto-attached child.
 
 It is copied like the other options.

Oh, you're right.  I got confused - GDB has code to always set options
on the fork children after PTRACE_EVENT_(V)FORK.   Dunno where that came from.

 Which sounds like another argument for PTRACE_O_INHERIT, as in:
 http://sourceware.org/ml/archer/2011-q1/msg00026.html
 
   The point of PTRACE_O_INHERIT would be to attach newly-created threads 
 and
   children without causing an event stop and the attendant overhead.
 
 this is another thing, I guess.

Yes, yes.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] coredump: add a new elf note with siginfo fields of the signal

2012-09-17 Thread Pedro Alves

On 09/13/2012 06:25 PM, Roland McGrath wrote:
> I agree with Oleg.  If there is an NT_SIGINFO note, it should contain
> exactly the siginfo_t layout and data that we otherwise expose to userland
> already.  That is, it must match what PTRACE_GETSIGINFO reports, which (I
> think) also matches exactly what appears on the stack for a signal
> delivery.  

Agreed.  Please.  One note though:

PTRACE_GETSIGINFO always returns the siginfo_t in the architecture/bitness
of the kernel.  IOW, PTRACE_GETSIGINFO on a 32-bit debuggee, running on a
64-bit kernel, returns the siginfo_t in 64-bit layout.  IOW, 
copy_siginfo_to_user32
doesn't apply).  Or maybe that's the architecture of the tracer, instead of the 
kernel.
Haven't really checked.  In any case, GDB has to basically duplicate the 
kernel's
copy_siginfo_to_user/from_user for each supported arch (that does biarch) 
because
of this, for "(gdb) print $_siginfo".
I wish we had a PTRACE_GETSIGINFO variant that returned the siginfo_t in
layout of the program, not the kernel's..

> Note also that compat_binfmt_elf must deliver the 32-bit version
> of siginfo_t, i.e. as copy_siginfo_to_user32 does.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] coredump: add a new elf note with siginfo fields of the signal

2012-09-17 Thread Pedro Alves

On 09/13/2012 06:25 PM, Roland McGrath wrote:
 I agree with Oleg.  If there is an NT_SIGINFO note, it should contain
 exactly the siginfo_t layout and data that we otherwise expose to userland
 already.  That is, it must match what PTRACE_GETSIGINFO reports, which (I
 think) also matches exactly what appears on the stack for a signal
 delivery.  

Agreed.  Please.  One note though:

PTRACE_GETSIGINFO always returns the siginfo_t in the architecture/bitness
of the kernel.  IOW, PTRACE_GETSIGINFO on a 32-bit debuggee, running on a
64-bit kernel, returns the siginfo_t in 64-bit layout.  IOW, 
copy_siginfo_to_user32
doesn't apply).  Or maybe that's the architecture of the tracer, instead of the 
kernel.
Haven't really checked.  In any case, GDB has to basically duplicate the 
kernel's
copy_siginfo_to_user/from_user for each supported arch (that does biarch) 
because
of this, for (gdb) print $_siginfo.
I wish we had a PTRACE_GETSIGINFO variant that returned the siginfo_t in
layout of the program, not the kernel's..

 Note also that compat_binfmt_elf must deliver the 32-bit version
 of siginfo_t, i.e. as copy_siginfo_to_user32 does.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable

2012-09-06 Thread Pedro Alves

On 09/06/2012 02:53 PM, Sasha Levin wrote:

> So I think that for the hash iterator it might actually be simpler.
> 
> My solution to making 'break' work in the iterator is:
> 
>   for (bkt = 0, node = NULL; bkt < HASH_SIZE(name) && node == NULL; bkt++)
>   hlist_for_each_entry(obj, node, [bkt], member)
> 
> We initialize our node loop cursor with NULL in the external loop, and the
> external loop will have a new condition to loop while that cursor is NULL.
> 
> My logic is that we can only 'break' when we are iterating over an object in 
> the
> internal loop. If we're iterating over an object in that loop then 'node != 
> NULL'.
> 
> This way, if we broke from within the internal loop, the external loop will 
> see
> node as not NULL, and so it will stop looping itself. On the other hand, if 
> the
> internal loop has actually ended, then node will be NULL, and the outer loop
> will keep running.
> 
> Is there anything I've missed?

Looks right to me, from a cursory look at hlist_for_each_entry.  That's exactly
what I meant with this most often being trivial when the inner loop's iterator
is a pointer that goes NULL at the end.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable

2012-09-06 Thread Pedro Alves

On 09/06/2012 02:53 PM, Sasha Levin wrote:

 So I think that for the hash iterator it might actually be simpler.
 
 My solution to making 'break' work in the iterator is:
 
   for (bkt = 0, node = NULL; bkt  HASH_SIZE(name)  node == NULL; bkt++)
   hlist_for_each_entry(obj, node, name[bkt], member)
 
 We initialize our node loop cursor with NULL in the external loop, and the
 external loop will have a new condition to loop while that cursor is NULL.
 
 My logic is that we can only 'break' when we are iterating over an object in 
 the
 internal loop. If we're iterating over an object in that loop then 'node != 
 NULL'.
 
 This way, if we broke from within the internal loop, the external loop will 
 see
 node as not NULL, and so it will stop looping itself. On the other hand, if 
 the
 internal loop has actually ended, then node will be NULL, and the outer loop
 will keep running.
 
 Is there anything I've missed?

Looks right to me, from a cursory look at hlist_for_each_entry.  That's exactly
what I meant with this most often being trivial when the inner loop's iterator
is a pointer that goes NULL at the end.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable

2012-09-04 Thread Pedro Alves

On 09/04/2012 11:41 PM, Steven Rostedt wrote:
> Ah, I missed the condition with the rec == >records[pg->index]. But
> if ftrace_pages_start is NULL, the rec = >records[pg->index] will
> fault.

Right.

> 
> You could do something like rec = pg ? >records[pg->index] : NULL,

Right.

> but IIRC, the comma operator does not guarantee order evaluation. That
> is, the compiler is allowed to process "a , b" as "b; a;" and not "a;
> b;".

Not true.  The comma operator introduces a sequence point.  It's the comma
that separates function parameters that doesn't guarantee ordering.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable

2012-09-04 Thread Pedro Alves

On 09/04/2012 09:59 PM, Steven Rostedt wrote:
> On Tue, 2012-09-04 at 18:21 +0100, Pedro Alves wrote:
>> On 09/04/2012 06:17 PM, Steven Rostedt wrote:
>>> On Tue, 2012-09-04 at 17:40 +0100, Pedro Alves wrote:
>>>
>>>> BTW, you can also go a step further and remove the need to close with 
>>>> double }},
>>>> with something like:
>>>>
>>>> #define do_for_each_ftrace_rec(pg, rec)
>>>>   \
>>>> for (pg = ftrace_pages_start, rec = >records[pg->index];   
>>>>   \
>>>>  pg && rec == >records[pg->index]; 
>>>>   \
>>>>  pg = pg->next)
>>>>   \
>>>>   for (rec = pg->records; rec < >records[pg->index]; rec++)
>>>>
>>>
>>> Yeah, but why bother? It's hidden in a macro, and the extra '{ }' shows
>>> that this is something "special".
>>
>> The point of both changes is that there's nothing special in the end
>> at all.  It all just works...
>>
> 
> It would still fail on a 'break'. The 'while' macro tells us that it is
> special, because in the end, it wont work.

Please explain why it would fail on a 'break'.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable

2012-09-04 Thread Pedro Alves

On 09/04/2012 06:17 PM, Steven Rostedt wrote:
> On Tue, 2012-09-04 at 17:40 +0100, Pedro Alves wrote:
> 
>> BTW, you can also go a step further and remove the need to close with double 
>> }},
>> with something like:
>>
>> #define do_for_each_ftrace_rec(pg, rec)  
>> \
>> for (pg = ftrace_pages_start, rec = >records[pg->index]; 
>> \
>>  pg && rec == >records[pg->index];   
>> \
>>  pg = pg->next)  
>> \
>>   for (rec = pg->records; rec < >records[pg->index]; rec++)
>>
> 
> Yeah, but why bother? It's hidden in a macro, and the extra '{ }' shows
> that this is something "special".

The point of both changes is that there's nothing special in the end
at all.  It all just works...

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable

2012-09-04 Thread Pedro Alves

On 09/04/2012 05:30 PM, Pedro Alves wrote:
> On 09/04/2012 04:35 PM, Steven Rostedt wrote:
>> On Tue, 2012-08-28 at 19:00 -0400, Mathieu Desnoyers wrote:
>>
>>> Looking again at:
>>>
>>> +#define hash_for_each_size(name, bits, bkt, node, obj, member) 
>>> \
>>> +   for (bkt = 0; bkt < HASH_SIZE(bits); bkt++) 
>>> \
>>> +   hlist_for_each_entry(obj, node, [bkt], member)
>>>
>>> you will notice that a "break" or "continue" in the inner loop will not
>>> affect the outer loop, which is certainly not what the programmer would
>>> expect!
>>>
>>> I advise strongly against creating such error-prone construct.
>>>
>>
>> A few existing loop macros do this. But they require a do { } while ()
>> approach, and all have a comment.
>>
>> It's used by do_each_thread() in sched.h and ftrace does this as well.
>> Look at kernel/trace/ftrace.c at do_for_each_ftrace_rec().
>>
>> Yes it breaks 'break' but it does not break 'continue' as it would just
>> go to the next item that would have been found (like a normal for
>> would).
> 
> /*
>  * This is a double for. Do not use 'break' to break out of the loop,
>  * you must use a goto.
>  */
> #define do_for_each_ftrace_rec(pg, rec) \
> for (pg = ftrace_pages_start; pg; pg = pg->next) {  \
> int _i; \
> for (_i = 0; _i < pg->index; _i++) {\
> rec = >records[_i];
> 
> 
> 
> You can make 'break' also work as expected if you can embed a little knowledge
> of the inner loop's condition in the outer loop's condition.  Sometimes it's
> trivial, most often when the inner loop's iterator is a pointer that goes
> NULL at the end, but other times not so much.  Something like (completely 
> untested):
> 
> #define do_for_each_ftrace_rec(pg, rec) \
> for (pg = ftrace_pages_start, rec = >records[pg->index];\
>  pg && rec == >records[pg->index];  \
>  pg = pg->next) {   \
> int _i; \
> for (_i = 0; _i < pg->index; _i++) {\
> rec = >records[_i];
>
> 
> (other variants possible)
> 
> IOW, the outer loop only iterates if the inner loop completes.  If there's
> a break in the inner loop, then the outer loop breaks too.  Of course, it
> all depends on whether the generated code looks sane or hideous, if
> the uses of the macro care for it over bug avoidance.
> 

BTW, you can also go a step further and remove the need to close with double }},
with something like:

#define do_for_each_ftrace_rec(pg, rec) 
 \
for (pg = ftrace_pages_start, rec = >records[pg->index];
 \
 pg && rec == >records[pg->index];  
 \
 pg = pg->next) 
 \
  for (rec = pg->records; rec < >records[pg->index]; rec++)

-- 
Pedro Alves
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable

2012-09-04 Thread Pedro Alves

On 09/04/2012 04:35 PM, Steven Rostedt wrote:
> On Tue, 2012-08-28 at 19:00 -0400, Mathieu Desnoyers wrote:
> 
>> Looking again at:
>>
>> +#define hash_for_each_size(name, bits, bkt, node, obj, member)  
>>\
>> +   for (bkt = 0; bkt < HASH_SIZE(bits); bkt++)  
>>\
>> +   hlist_for_each_entry(obj, node, [bkt], member)
>>
>> you will notice that a "break" or "continue" in the inner loop will not
>> affect the outer loop, which is certainly not what the programmer would
>> expect!
>>
>> I advise strongly against creating such error-prone construct.
>>
> 
> A few existing loop macros do this. But they require a do { } while ()
> approach, and all have a comment.
> 
> It's used by do_each_thread() in sched.h and ftrace does this as well.
> Look at kernel/trace/ftrace.c at do_for_each_ftrace_rec().
> 
> Yes it breaks 'break' but it does not break 'continue' as it would just
> go to the next item that would have been found (like a normal for
> would).

/*
 * This is a double for. Do not use 'break' to break out of the loop,
 * you must use a goto.
 */
#define do_for_each_ftrace_rec(pg, rec) \
for (pg = ftrace_pages_start; pg; pg = pg->next) {  \
int _i; \
for (_i = 0; _i < pg->index; _i++) {\
rec = >records[_i];



You can make 'break' also work as expected if you can embed a little knowledge
of the inner loop's condition in the outer loop's condition.  Sometimes it's
trivial, most often when the inner loop's iterator is a pointer that goes
NULL at the end, but other times not so much.  Something like (completely 
untested):

#define do_for_each_ftrace_rec(pg, rec) \
for (pg = ftrace_pages_start, rec = >records[pg->index];\
 pg && rec == >records[pg->index];  \
 pg = pg->next) {   \
int _i; \
for (_i = 0; _i < pg->index; _i++) {\
rec = >records[_i];

(other variants possible)

IOW, the outer loop only iterates if the inner loop completes.  If there's
a break in the inner loop, then the outer loop breaks too.  Of course, it
all depends on whether the generated code looks sane or hideous, if
the uses of the macro care for it over bug avoidance.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable

2012-09-04 Thread Pedro Alves

On 09/04/2012 04:35 PM, Steven Rostedt wrote:
 On Tue, 2012-08-28 at 19:00 -0400, Mathieu Desnoyers wrote:
 
 Looking again at:

 +#define hash_for_each_size(name, bits, bkt, node, obj, member)  
\
 +   for (bkt = 0; bkt  HASH_SIZE(bits); bkt++)  
\
 +   hlist_for_each_entry(obj, node, name[bkt], member)

 you will notice that a break or continue in the inner loop will not
 affect the outer loop, which is certainly not what the programmer would
 expect!

 I advise strongly against creating such error-prone construct.

 
 A few existing loop macros do this. But they require a do { } while ()
 approach, and all have a comment.
 
 It's used by do_each_thread() in sched.h and ftrace does this as well.
 Look at kernel/trace/ftrace.c at do_for_each_ftrace_rec().
 
 Yes it breaks 'break' but it does not break 'continue' as it would just
 go to the next item that would have been found (like a normal for
 would).

/*
 * This is a double for. Do not use 'break' to break out of the loop,
 * you must use a goto.
 */
#define do_for_each_ftrace_rec(pg, rec) \
for (pg = ftrace_pages_start; pg; pg = pg-next) {  \
int _i; \
for (_i = 0; _i  pg-index; _i++) {\
rec = pg-records[_i];



You can make 'break' also work as expected if you can embed a little knowledge
of the inner loop's condition in the outer loop's condition.  Sometimes it's
trivial, most often when the inner loop's iterator is a pointer that goes
NULL at the end, but other times not so much.  Something like (completely 
untested):

#define do_for_each_ftrace_rec(pg, rec) \
for (pg = ftrace_pages_start, rec = pg-records[pg-index];\
 pg  rec == pg-records[pg-index];  \
 pg = pg-next) {   \
int _i; \
for (_i = 0; _i  pg-index; _i++) {\
rec = pg-records[_i];

(other variants possible)

IOW, the outer loop only iterates if the inner loop completes.  If there's
a break in the inner loop, then the outer loop breaks too.  Of course, it
all depends on whether the generated code looks sane or hideous, if
the uses of the macro care for it over bug avoidance.

-- 
Pedro Alves

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 104 matches

Mail list logo