Re: [PATCH] entry: Fix missed trap after single-step on system call return

2021-02-04 Thread Linus Torvalds
On Wed, Feb 3, 2021 at 3:55 PM Kyle Huey  wrote:
>
> I have verified that a) the test case I sent earlier passes now and b)
> all rr tests pass now.

Thanks for keeping on top of this.

Thomas/Andy - the patch looks straightforward and obvious enough, and
I don't see any issues with it, so I assume I'll get it through the
normal channels and will archive this whole discussion.

No huge hurry, as long as it hits 5.11 final so that we don't end up
with a regression.

Thanks,
   Linus


Re: [PATCH] entry: Fix missed trap after single-step on system call return

2021-02-03 Thread Kyle Huey
On Wed, Feb 3, 2021 at 10:11 AM Kyle Huey  wrote:
>
> On Wed, Feb 3, 2021 at 10:00 AM Gabriel Krisman Bertazi
>  wrote:
> > This seems to pass Kyle's test case.  Kyle, can you verify it works with
> > rr?
>
> I will test it later today.

I have verified that a) the test case I sent earlier passes now and b)
all rr tests pass now.

Tested-by: Kyle Huey 

- Kyle


Re: [PATCH] entry: Fix missed trap after single-step on system call return

2021-02-03 Thread Linus Torvalds
On Wed, Feb 3, 2021 at 10:18 AM Andy Lutomirski  wrote:
>
> So if we want to reliably single-step a system call and trap after the
> system call, we just need to synthesize a trap on the way out.

Well, I think Gabriel's patch does exactly that, due to how
SYSCALL_EXIT_TRAP is set. It looks like subsequent system calls will
work exactly the way the concurrent system call case (that Kyle's test
did) does.

So it all _looks_ sane to me, but this is one of those "it needs testing".

   Linus


Re: [PATCH] entry: Fix missed trap after single-step on system call return

2021-02-03 Thread Andy Lutomirski
On Wed, Feb 3, 2021 at 10:10 AM Linus Torvalds
 wrote:
>
> On Wed, Feb 3, 2021 at 10:00 AM Gabriel Krisman Bertazi
>  wrote:
> >
> > Does the patch below follows your suggestion?  I'm setting the
> > SYSCALL_WORK shadowing TIF_SINGLESTEP every time, instead of only when
> > the child is inside a system call.  Is this acceptable?
>
> Looks sane to me.
>
> My main worry would be about "what about the next system call"? It's
> not what Kyle's case cares about, but let me just give an example:
>
>  - task A traces task B, and starts single-stepping. Task B was *not*
> in a system call at this point.
>
>  - task B happily executes one instruction at a time, takes a TF
> fault, everything is good
>
>  - task B now does a system call. That will disable single-stepping
> while in the kernel
>
>  - task B returns from the system call. TF will be set in eflags, but
> the first instruction *after* the system call will execute unless we
> go through the system call exit path
>
> So I think the tracer basically misses one instruction when single-stepping.

I was hoping you wouldn't ask this :)

The x86 architecture is fundamentally a bit busted here.  If we return
from a system call with SYSRET and TF is set in R11, then SYSRET
traps, which means that #DB is delivered before executing a user
instruction.  I have been asking Intel for quite a while to document
this, and they said they did, but I still can't find it.  IRET is the
opposite: if we return from a system call with IRET and TF is set on
the stack, we execute one user instruction and then trap.

So if we want to reliably single-step a system call and trap after the
system call, we just need to synthesize a trap on the way out.  Doing
this and getting all the nasty corners (e.g. sigreturn setting TF,
sigreturn *clearing* TF, signal delivery as part of the syscall,
ptrace mucking with TF) etc right might be nontrivial.

I suspect the behavior back in the bad old asm-entry-path days was at
best inconsistent.

--Andy


Re: [PATCH] entry: Fix missed trap after single-step on system call return

2021-02-03 Thread Kyle Huey
On Wed, Feb 3, 2021 at 10:00 AM Gabriel Krisman Bertazi
 wrote:
>
> Linus Torvalds  writes:
>
> > On Sun, Jan 31, 2021 at 3:35 PM Linus Torvalds
> >  wrote:
> >>
> >> I wonder if the simple solution is to just
> >>
> >>  (a) always set one of the SYSCALL_WORK_EXIT bits on the child in
> >> ptrace (exactly to catch the child on system call exit)
> >>
> >>  (b) basically revert 299155244770 ("entry: Drop usage of TIF flags in
> >> the generic syscall code") and have the syscall exit code check the
> >> TIF_SINGLESTEP flag
> >
> > Actually, (b) looks unnecessary - as long as we get to
> > syscall_exit_work(), the current code will work fine.
> >
> > So maybe just add a dummy SYSCALL_WORK_SYSCALL_EXIT_TRAP, and set that
> > flag whenever a singestep is requested for a process that is currently
> > in a system call?
> >
> > IOW, make it a very explicit "do TF for system calls", rather than the
> > old code that was doing so implicitly and not very obviously. Hmm?
>
> Linus,
>
> Does the patch below follows your suggestion?  I'm setting the
> SYSCALL_WORK shadowing TIF_SINGLESTEP every time, instead of only when
> the child is inside a system call.  Is this acceptable?
>
> This seems to pass Kyle's test case.  Kyle, can you verify it works with
> rr?

I will test it later today.

> I can also turn Kyle's test case into a selftest, if it is ok with him.

Sure. Consider whatever license/copyright/etc you need granted.

- Kyle

> Thanks,
>
> -- >8 --
> Subject: [PATCH] entry: Fix missed trap after single-step on a system call 
> return
>
> Commit 299155244770 ("entry: Drop usage of TIF flags in the generic
> syscall code") introduces a bug on architectures using the generic
> syscall entry code, in which processes stopped by PTRACE_SYSCALL do not
> trap on syscall return after receiving a TIF_SINGLESTEP. The reason is
> the meaning of TIF_SINGLESTEP flag is overloaded to cause the trap after
> a system call is executed, but since the above commit, the syscall call
> handler only checks for the SYSCALL_WORK flags on the exit work.
>
> This patch splits the meaning of TIF_SINGLESTEP such that it only means
> single-step mode, and creates a new type of SYSCALL_WORK to request a
> trap immediately after a syscall in single-step mode.  In the current
> implementation, the SYSCALL_WORK flag shadows the TIF_SINGLESTEP flag
> for simplicity.
>
> Since x86 is the only code already using the generic syscall handling,
> this also updates that architecture to flip this bit when a tracer
> enables single step.
>
> Suggested-by: Linus Torvalds 
> Fixes: 299155244770 ("entry: Drop usage of TIF flags in the generic syscall 
> code")
> Signed-off-by: Gabriel Krisman Bertazi 
> ---
>  arch/x86/include/asm/entry-common.h |  2 --
>  arch/x86/kernel/step.c  | 10 --
>  include/linux/entry-common.h|  1 +
>  include/linux/thread_info.h |  2 ++
>  kernel/entry/common.c   | 12 ++--
>  5 files changed, 13 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/entry-common.h 
> b/arch/x86/include/asm/entry-common.h
> index 6fe54b2813c1..2b87b191b3b8 100644
> --- a/arch/x86/include/asm/entry-common.h
> +++ b/arch/x86/include/asm/entry-common.h
> @@ -43,8 +43,6 @@ static __always_inline void arch_check_user_regs(struct 
> pt_regs *regs)
>  }
>  #define arch_check_user_regs arch_check_user_regs
>
> -#define ARCH_SYSCALL_EXIT_WORK (_TIF_SINGLESTEP)
> -
>  static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
>   unsigned long ti_work)
>  {
> diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
> index 60d2c3798ba2..de975b62f10a 100644
> --- a/arch/x86/kernel/step.c
> +++ b/arch/x86/kernel/step.c
> @@ -127,12 +127,17 @@ static int enable_single_step(struct task_struct *child)
> regs->flags |= X86_EFLAGS_TF;
>
> /*
> -* Always set TIF_SINGLESTEP - this guarantees that
> -* we single-step system calls etc..  This will also
> +* Always set TIF_SINGLESTEP.  This will also
>  * cause us to set TF when returning to user mode.
>  */
> set_tsk_thread_flag(child, TIF_SINGLESTEP);
>
> +   /*
> +* Trigger a trap is triggered once stepping out of a system
> +* call prior to executing any user instruction.
> +*/
> +   set_task_syscall_work(child, SYSCALL_EXIT_TRAP);
> +
> oflags = regs->flags;
>
> /* Set TF on the kernel stack.. */

Re: [PATCH] entry: Fix missed trap after single-step on system call return

2021-02-03 Thread Linus Torvalds
On Wed, Feb 3, 2021 at 10:00 AM Gabriel Krisman Bertazi
 wrote:
>
> Does the patch below follows your suggestion?  I'm setting the
> SYSCALL_WORK shadowing TIF_SINGLESTEP every time, instead of only when
> the child is inside a system call.  Is this acceptable?

Looks sane to me.

My main worry would be about "what about the next system call"? It's
not what Kyle's case cares about, but let me just give an example:

 - task A traces task B, and starts single-stepping. Task B was *not*
in a system call at this point.

 - task B happily executes one instruction at a time, takes a TF
fault, everything is good

 - task B now does a system call. That will disable single-stepping
while in the kernel

 - task B returns from the system call. TF will be set in eflags, but
the first instruction *after* the system call will execute unless we
go through the system call exit path

So I think the tracer basically misses one instruction when single-stepping.

I think your patch works for this case (because the SYSCALL_EXIT_TRAP
flag stays set until single-stepping is done), so I think it's all
good. But can you verify, just to allay my worry?

 Linus


[PATCH] entry: Fix missed trap after single-step on system call return

2021-02-03 Thread Gabriel Krisman Bertazi
Linus Torvalds  writes:

> On Sun, Jan 31, 2021 at 3:35 PM Linus Torvalds
>  wrote:
>>
>> I wonder if the simple solution is to just
>>
>>  (a) always set one of the SYSCALL_WORK_EXIT bits on the child in
>> ptrace (exactly to catch the child on system call exit)
>>
>>  (b) basically revert 299155244770 ("entry: Drop usage of TIF flags in
>> the generic syscall code") and have the syscall exit code check the
>> TIF_SINGLESTEP flag
>
> Actually, (b) looks unnecessary - as long as we get to
> syscall_exit_work(), the current code will work fine.
>
> So maybe just add a dummy SYSCALL_WORK_SYSCALL_EXIT_TRAP, and set that
> flag whenever a singestep is requested for a process that is currently
> in a system call?
>
> IOW, make it a very explicit "do TF for system calls", rather than the
> old code that was doing so implicitly and not very obviously. Hmm?

Linus,

Does the patch below follows your suggestion?  I'm setting the
SYSCALL_WORK shadowing TIF_SINGLESTEP every time, instead of only when
the child is inside a system call.  Is this acceptable?

This seems to pass Kyle's test case.  Kyle, can you verify it works with
rr?

I can also turn Kyle's test case into a selftest, if it is ok with him.

Thanks,

-- >8 --
Subject: [PATCH] entry: Fix missed trap after single-step on a system call 
return

Commit 299155244770 ("entry: Drop usage of TIF flags in the generic
syscall code") introduces a bug on architectures using the generic
syscall entry code, in which processes stopped by PTRACE_SYSCALL do not
trap on syscall return after receiving a TIF_SINGLESTEP. The reason is
the meaning of TIF_SINGLESTEP flag is overloaded to cause the trap after
a system call is executed, but since the above commit, the syscall call
handler only checks for the SYSCALL_WORK flags on the exit work.

This patch splits the meaning of TIF_SINGLESTEP such that it only means
single-step mode, and creates a new type of SYSCALL_WORK to request a
trap immediately after a syscall in single-step mode.  In the current
implementation, the SYSCALL_WORK flag shadows the TIF_SINGLESTEP flag
for simplicity.

Since x86 is the only code already using the generic syscall handling,
this also updates that architecture to flip this bit when a tracer
enables single step.

Suggested-by: Linus Torvalds 
Fixes: 299155244770 ("entry: Drop usage of TIF flags in the generic syscall 
code")
Signed-off-by: Gabriel Krisman Bertazi 
---
 arch/x86/include/asm/entry-common.h |  2 --
 arch/x86/kernel/step.c  | 10 --
 include/linux/entry-common.h|  1 +
 include/linux/thread_info.h |  2 ++
 kernel/entry/common.c   | 12 ++--
 5 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/entry-common.h 
b/arch/x86/include/asm/entry-common.h
index 6fe54b2813c1..2b87b191b3b8 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -43,8 +43,6 @@ static __always_inline void arch_check_user_regs(struct 
pt_regs *regs)
 }
 #define arch_check_user_regs arch_check_user_regs
 
-#define ARCH_SYSCALL_EXIT_WORK (_TIF_SINGLESTEP)
-
 static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
  unsigned long ti_work)
 {
diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 60d2c3798ba2..de975b62f10a 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -127,12 +127,17 @@ static int enable_single_step(struct task_struct *child)
regs->flags |= X86_EFLAGS_TF;
 
/*
-* Always set TIF_SINGLESTEP - this guarantees that
-* we single-step system calls etc..  This will also
+* Always set TIF_SINGLESTEP.  This will also
 * cause us to set TF when returning to user mode.
 */
set_tsk_thread_flag(child, TIF_SINGLESTEP);
 
+   /*
+* Trigger a trap is triggered once stepping out of a system
+* call prior to executing any user instruction.
+*/
+   set_task_syscall_work(child, SYSCALL_EXIT_TRAP);
+
oflags = regs->flags;
 
/* Set TF on the kernel stack.. */
@@ -230,6 +235,7 @@ void user_disable_single_step(struct task_struct *child)
 
/* Always clear TIF_SINGLESTEP... */
clear_tsk_thread_flag(child, TIF_SINGLESTEP);
+   clear_task_syscall_work(child, SYSCALL_EXIT_TRAP);
 
/* But touch TF only if it was set by us.. */
if (test_and_clear_tsk_thread_flag(child, TIF_FORCED_TF))
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index ca86a00abe86..a104b298019a 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -46,6 +46,7 @@
 SYSCALL_WORK_SYSCALL_TRACE |   \