Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-04-03 Thread Torsten Duwe
On Wed, Apr 03, 2019 at 03:48:43AM +0100, Mark Rutland wrote:
> Hi Torsten,
> 
> Sorry for the long delay prior to this reply.

I was hoping you would come up with some code to speed things up :(

(For the record, v8 was the latest I sent, but nothing in the locations
mentioned here has changed)

> On Fri, Jan 18, 2019 at 05:39:08PM +0100, Torsten Duwe wrote:
> > --- a/arch/arm64/include/asm/ftrace.h
> > +++ b/arch/arm64/include/asm/ftrace.h
> > @@ -14,9 +14,24 @@
> >  #include 
> >  
> >  #define HAVE_FUNCTION_GRAPH_FP_TEST
> > -#define MCOUNT_ADDR((unsigned long)_mcount)
> >  #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
> >  
> > +/*
> > + * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the 
> > beginning
> > + * of each function, with the second NOP actually calling ftrace. In 
> > contrary
> > + * to a classic _mcount call, the call instruction to be modified is thus
> > + * the second one, and not the only one.
> > + */
> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +#define ARCH_SUPPORTS_FTRACE_OPS 1
> > +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> > +/* All we need is some magic value. Simply use "_mCount:" */
> > +#define MCOUNT_ADDR(0x5f6d436f756e743a)
> 
> I'm really not keen on having a magic constant, and I'd really like to
> see MCOUNT_ADDR disappear entirely when the compiler doesn't generate an
> mcount call. I'm concerned that this is confusing and fragile.

Confusing: agreed. Fragile? don't think so.

> I think that it would be better to make the core ftrace API agnostic of
> mcount, and to teach it that there's a one-time initialization step for
> callsites (which is not necessarily patching a call with a NOP).
> 
> We currently have:
> 
> ftrace_make_nop(mod, rec, addr)
> ftrace_make_call(rec, addr)
> ftrace_modify_call(rec, old_addr, new_addr)
> 
> ... whereas we could have:
> 
> ftrace_call_init(mod, rec)
> ftrace_call_enable(rec, addr)
> ftrace_call_disable(rec, addr)
> ftrace_call_modify(rec, old_addr, new_addr)
> 
> ... so we wouldn't need to special-case anything for initialization (and
> don't need MCOUNT_ADDR at all), and it would be clearer as to what was
> happening at each stage.

I'm fully on your side here, BUT...

This is the dynamic ftrace with regs implementation for aarch64 with the
_current_ dynamic ftrace API. Changing the latter is IMO a different issue.
This implementation has been tested, up to the point of some preliminary live
patching. I propose to commit this and only then change the API for aarch64
and s390 simultaneously, avoiding breakage on ppc64le and x86, of course.
(others to be investigated)

> > +
> > +   /* The program counter just after the ftrace call site */
> > +   str lr, [sp, #S_PC]
> 
> For consistency with the existing ftrace assembly, please use 'x30'
> rather than 'lr'.

You could have raised that concern already along with the "fp" issue for v6.
I don't have a strong preference here; personally I find fp and lr more
readable, but with x29 and x30 one knows where they go.
This is only just a waste of time.

> > -   module_disable_ro(mod);
> > -   *mod->arch.ftrace_trampoline = trampoline;
> > -   module_enable_ro(mod, true);
> > +   /* Check against our well-known list of ftrace entry points */
> > +   if (addr == FTRACE_ADDR || addr == FTRACE_REGS_ADDR) {
> 
> These checks exist within install_ftrace_trampoline(), so we don't need
> to duplicate them here.

True. Code evolution at work.

Any other opinions on the dynamic ftrace API change, anyone?

Torsten



Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-04-03 Thread Steven Rostedt
On Wed, 3 Apr 2019 03:48:43 +0100
Mark Rutland  wrote:

> We currently have:
> 
> ftrace_make_nop(mod, rec, addr)
> ftrace_make_call(rec, addr)
> ftrace_modify_call(rec, old_addr, new_addr)
> 
> ... whereas we could have:
> 
> ftrace_call_init(mod, rec)
> ftrace_call_enable(rec, addr)
> ftrace_call_disable(rec, addr)
> ftrace_call_modify(rec, old_addr, new_addr)
> 
> ... so we wouldn't need to special-case anything for initialization (and
> don't need MCOUNT_ADDR at all), and it would be clearer as to what was
> happening at each stage.

Just a note. I would be OK if someone wants to work in changing the
ftrace interface to be this. I'll review the code, but I don't have the
cycles to implement such a change.

-- Steve


Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-04-02 Thread Mark Rutland
Hi Torsten,

Sorry for the long delay prior to this reply.

On Fri, Jan 18, 2019 at 05:39:08PM +0100, Torsten Duwe wrote:
> Once gcc8 adds 2 NOPs at the beginning of each function, replace the
> first NOP thus generated with a quick LR saver (move it to scratch reg
> x9), so the 2nd replacement insn, the call to ftrace, does not clobber
> the value. Ftrace will then generate the standard stack frames.
> 
> Note that patchable-function-entry in GCC disables IPA-RA, which means
> ABI register calling conventions are obeyed *and* scratch registers
> such as x9 are available.
> 
> Introduce and handle an ftrace_regs_trampoline for module PLTs, right
> after ftrace_trampoline, and double the size of this special section.
> 
> Signed-off-by: Torsten Duwe 
> 
> ---
> 
> Mark, if you see your ftrace entry macro code being represented correctly
> here, please add your sign-off, As I've initially copied it from your mail.
> 
> ---
>  arch/arm64/include/asm/ftrace.h  |   17 -
>  arch/arm64/include/asm/module.h  |3 
>  arch/arm64/kernel/entry-ftrace.S |  125 
> +--
>  arch/arm64/kernel/ftrace.c   |  114 ++-
>  arch/arm64/kernel/module-plts.c  |3 
>  arch/arm64/kernel/module.c   |2 
>  6 files changed, 227 insertions(+), 37 deletions(-)
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -14,9 +14,24 @@
>  #include 
>  
>  #define HAVE_FUNCTION_GRAPH_FP_TEST
> -#define MCOUNT_ADDR  ((unsigned long)_mcount)
>  #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
>  
> +/*
> + * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
> + * of each function, with the second NOP actually calling ftrace. In contrary
> + * to a classic _mcount call, the call instruction to be modified is thus
> + * the second one, and not the only one.
> + */
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +#define ARCH_SUPPORTS_FTRACE_OPS 1
> +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> +/* All we need is some magic value. Simply use "_mCount:" */
> +#define MCOUNT_ADDR  (0x5f6d436f756e743a)

I'm really not keen on having a magic constant, and I'd really like to
see MCOUNT_ADDR disappear entirely when the compiler doesn't generate an
mcount call. I'm concerned that this is confusing and fragile.

I think that it would be better to make the core ftrace API agnostic of
mcount, and to teach it that there's a one-time initialization step for
callsites (which is not necessarily patching a call with a NOP).

We currently have:

ftrace_make_nop(mod, rec, addr)
ftrace_make_call(rec, addr)
ftrace_modify_call(rec, old_addr, new_addr)

... whereas we could have:

ftrace_call_init(mod, rec)
ftrace_call_enable(rec, addr)
ftrace_call_disable(rec, addr)
ftrace_call_modify(rec, old_addr, new_addr)

... so we wouldn't need to special-case anything for initialization (and
don't need MCOUNT_ADDR at all), and it would be clearer as to what was
happening at each stage.

> +#else
> +#define REC_IP_BRANCH_OFFSET 0
> +#define MCOUNT_ADDR  ((unsigned long)_mcount)
> +#endif
> +
>  #ifndef __ASSEMBLY__
>  #include 
>  
> --- a/arch/arm64/kernel/entry-ftrace.S
> +++ b/arch/arm64/kernel/entry-ftrace.S
> @@ -10,6 +10,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -124,6 +125,7 @@ EXPORT_SYMBOL(_mcount)
>  NOKPROBE(_mcount)
>  
>  #else /* CONFIG_DYNAMIC_FTRACE */
> +#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>  /*
>   * _mcount() is used to build the kernel with -pg option, but all the branch
>   * instructions to _mcount() are replaced to NOP initially at kernel start 
> up,
> @@ -163,11 +165,6 @@ GLOBAL(ftrace_graph_call)// ftrace_gra
>  
>   mcount_exit
>  ENDPROC(ftrace_caller)
> -#endif /* CONFIG_DYNAMIC_FTRACE */
> -
> -ENTRY(ftrace_stub)
> - ret
> -ENDPROC(ftrace_stub)
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>  /*
> @@ -187,7 +184,125 @@ ENTRY(ftrace_graph_caller)
>  
>   mcount_exit
>  ENDPROC(ftrace_graph_caller)
> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> +
> +#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> +
> + .macro  ftrace_regs_entry, allregs=0
> + /* make room for pt_regs, plus a callee frame */
> + sub sp, sp, #(S_FRAME_SIZE + 16)
> +
> + /* save function arguments */
> + stp x0, x1, [sp, #S_X0]
> + stp x2, x3, [sp, #S_X2]
> + stp x4, x5, [sp, #S_X4]
> + stp x6, x7, [sp, #S_X6]
> + stp x8, x9, [sp, #S_X8]
>  
> + .if \allregs == 1
> + stp x10, x11, [sp, #S_X10]
> + stp x12, x13, [sp, #S_X12]
> + stp x14, x15, [sp, #S_X14]
> + stp x16, x17, [sp, #S_X16]
> + stp x18, x19, [sp, #S_X18]
> + stp x20, x21, [sp, #S_X20]
> + stp x22, x23, [sp, #S_X22]
> + stp x24, x25, [sp, #S_X24]
> + stp x26, x27, [sp, #S_X26]
> + .endif
> +
> + /* Save fp and x28, which is used in this function. */
> + 

Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-07 Thread Torsten Duwe
On Thu, Feb 07, 2019 at 09:51:57AM -0500, Steven Rostedt wrote:
> On Thu, 7 Feb 2019 10:33:50 +
> Julien Thierry  wrote:
> 
> > I don't see really much documentation on that function. As far as I can
> > tell it is only called once for each site (and if it didn't, we'd always
> > be placing the same instruction, but I agree it wouldn't be nice). It
> > could depend on how far you can expand the notion of "adjusting" :) .
> > 
> > Steven, do you have an opinion on whether it would be acceptable to
> > modify function entry code in ftrace_call_adjust() ?
> 
> Just to make sure I'm on the same page as you are. You want to modify
> the function entry code at the time of the ftrace_call_adjust()?

Yes, this was the fiendish plan ;-)

> I would update the rec->ip to the offset you want at
> ftrace_call_adjust() but not do any modifications. It really isn't safe
> to do it there. But right after that is called, you will have the arch
> specific call of ftrace_make_nop() called with MCOUNT_ADDR as the
> second parameter to let you know that this is the first time the call
> is made at this address. This is where you can do that initial
> modifications.

Ok, so just substitute REC_IP_BRANCH_OFFSET arithmetic with
ftrace_call_adjust() and keep the MCOUNT_ADDR logic.
Thanks for the clarification.

Torsten



Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-07 Thread Julien Thierry



On 07/02/2019 14:51, Steven Rostedt wrote:
> On Thu, 7 Feb 2019 10:33:50 +
> Julien Thierry  wrote:
> 
>> I don't see really much documentation on that function. As far as I can
>> tell it is only called once for each site (and if it didn't, we'd always
>> be placing the same instruction, but I agree it wouldn't be nice). It
>> could depend on how far you can expand the notion of "adjusting" :) .
>>
>> Steven, do you have an opinion on whether it would be acceptable to
>> modify function entry code in ftrace_call_adjust() ?
> 
> Just to make sure I'm on the same page as you are. You want to modify
> the function entry code at the time of the ftrace_call_adjust()?
> 

Yes, that was the intention, the reason being that we want to have an
instruction that is just patched once on each entry for initialization.

> I would update the rec->ip to the offset you want at
> ftrace_call_adjust() but not do any modifications. It really isn't safe
> to do it there. But right after that is called, you will have the arch
> specific call of ftrace_make_nop() called with MCOUNT_ADDR as the
> second parameter to let you know that this is the first time the call
> is made at this address. This is where you can do that initial
> modifications.

Yes, this is was the current version of this patch is doing, I was just
wondering whether we could clean things up a little (i.e. have
ftrace_make_nop() not generate a non-nop instruction :) ).

But we're not going to hack through the API if this is definitely not
what it's intended for. We'll keep it as is and just update the rec->ip
with ftrace_call_adjust().

If we really want to clean things up, we could look into patching this
at instruction at build time. But if it's not trivial I guess we can
keep that for a later time.

Thanks,

-- 
Julien Thierry


Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-07 Thread Steven Rostedt
On Thu, 7 Feb 2019 10:33:50 +
Julien Thierry  wrote:

> I don't see really much documentation on that function. As far as I can
> tell it is only called once for each site (and if it didn't, we'd always
> be placing the same instruction, but I agree it wouldn't be nice). It
> could depend on how far you can expand the notion of "adjusting" :) .
> 
> Steven, do you have an opinion on whether it would be acceptable to
> modify function entry code in ftrace_call_adjust() ?

Just to make sure I'm on the same page as you are. You want to modify
the function entry code at the time of the ftrace_call_adjust()?

I would update the rec->ip to the offset you want at
ftrace_call_adjust() but not do any modifications. It really isn't safe
to do it there. But right after that is called, you will have the arch
specific call of ftrace_make_nop() called with MCOUNT_ADDR as the
second parameter to let you know that this is the first time the call
is made at this address. This is where you can do that initial
modifications.

-- Steve


Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-07 Thread Julien Thierry



On 07/02/2019 12:51, Torsten Duwe wrote:
> On Thu, Feb 07, 2019 at 10:33:50AM +, Julien Thierry wrote:
>>
>>
>> On 06/02/2019 15:05, Torsten Duwe wrote:
>>> On Wed, Feb 06, 2019 at 08:59:44AM +, Julien Thierry wrote:
 Hi Torsten,

 On 18/01/2019 16:39, Torsten Duwe wrote:

> --- a/arch/arm64/kernel/ftrace.c
> +++ b/arch/arm64/kernel/ftrace.c
> @@ -133,17 +163,45 @@ int ftrace_make_call(struct dyn_ftrace *
>   return ftrace_modify_code(pc, old, new, true);
>  }
>  
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> + unsigned long addr)
> +{
> + unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> + u32 old, new;
> +
> + old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
> + new = aarch64_insn_gen_branch_imm(pc, addr, true);
> +
> + return ftrace_modify_code(pc, old, new, true);
> +}
> +#endif
> +
>  /*
>   * Turn off the call to ftrace_caller() in instrumented function
>   */
>  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
>   unsigned long addr)
>  {
> - unsigned long pc = rec->ip;
> + unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;

 Sorry to come back on this patch again, but I was looking at the ftrace
 code a bit, and I see that when processing the ftrace call locations,
 ftrace calls ftrace_call_adjust() on every ip registered as mcount
 caller (or in our case patchable entries). This ftrace_call_adjust() is
 arch specific, so I was thinking we could place the offset in here once
 and for all so we don't have to worry about it in the future.
>>>
>>> Now that you mention it - yes indeed that's the correct facility to fix
>>> the deviating address, as Steve has also confirmed. I had totally forgotten
>>> about this hook.
>>>
 Also, I'm unsure whether it would be safe, but we could patch the "mov
 x9, lr" there as well. In theory, this would be called at init time
 (before secondary CPUs are brought up) and when loading a module (so I'd
 expect no-one is executing that code *yet*.

 If this is possible, I think it would make things a bit cleaner.
>>>
>>> This is in fact very tempting, but it will introduce a nasty side effect
>>> to ftrace_call_adjust. Is there any obvious documentation that specifies
>>> guarantees about ftrace_call_adjust being called exactly once for each site?
>>>
>>
>> I don't see really much documentation on that function. As far as I can
>> tell it is only called once for each site (and if it didn't, we'd always
>> be placing the same instruction, but I agree it wouldn't be nice). It
>> could depend on how far you can expand the notion of "adjusting" :) .
> 
> I've been thinking this over and I'm considering to make an ftrace_modify_code
> with verify and warn_once if it fails. Then read the insn back and bug_on
> should it not be the lr saver. Any objections?
> 

Hmmm, I'm not really convinced the read back + bug part would really be
useful right after patching this instruction in. ftrace_modify_code()
should already return an error if the instruction patching failed.

A real issue would be if ftrace_call_adjust() would be called on a
location where we shouldn't patch the instruction (i.e. a location that
is not the first instruction of a patchable entry). But to me, it
doesn't look like this function is intended to be called on something
else than the "mcount callsites" (which in our case is that first
patchable instruction).

Cheers,

-- 
Julien Thierry


Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-07 Thread Torsten Duwe
On Thu, Feb 07, 2019 at 10:33:50AM +, Julien Thierry wrote:
> 
> 
> On 06/02/2019 15:05, Torsten Duwe wrote:
> > On Wed, Feb 06, 2019 at 08:59:44AM +, Julien Thierry wrote:
> >> Hi Torsten,
> >>
> >> On 18/01/2019 16:39, Torsten Duwe wrote:
> >>
> >>> --- a/arch/arm64/kernel/ftrace.c
> >>> +++ b/arch/arm64/kernel/ftrace.c
> >>> @@ -133,17 +163,45 @@ int ftrace_make_call(struct dyn_ftrace *
> >>>   return ftrace_modify_code(pc, old, new, true);
> >>>  }
> >>>  
> >>> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> >>> +int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> >>> + unsigned long addr)
> >>> +{
> >>> + unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> >>> + u32 old, new;
> >>> +
> >>> + old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
> >>> + new = aarch64_insn_gen_branch_imm(pc, addr, true);
> >>> +
> >>> + return ftrace_modify_code(pc, old, new, true);
> >>> +}
> >>> +#endif
> >>> +
> >>>  /*
> >>>   * Turn off the call to ftrace_caller() in instrumented function
> >>>   */
> >>>  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> >>>   unsigned long addr)
> >>>  {
> >>> - unsigned long pc = rec->ip;
> >>> + unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> >>
> >> Sorry to come back on this patch again, but I was looking at the ftrace
> >> code a bit, and I see that when processing the ftrace call locations,
> >> ftrace calls ftrace_call_adjust() on every ip registered as mcount
> >> caller (or in our case patchable entries). This ftrace_call_adjust() is
> >> arch specific, so I was thinking we could place the offset in here once
> >> and for all so we don't have to worry about it in the future.
> > 
> > Now that you mention it - yes indeed that's the correct facility to fix
> > the deviating address, as Steve has also confirmed. I had totally forgotten
> > about this hook.
> > 
> >> Also, I'm unsure whether it would be safe, but we could patch the "mov
> >> x9, lr" there as well. In theory, this would be called at init time
> >> (before secondary CPUs are brought up) and when loading a module (so I'd
> >> expect no-one is executing that code *yet*.
> >>
> >> If this is possible, I think it would make things a bit cleaner.
> > 
> > This is in fact very tempting, but it will introduce a nasty side effect
> > to ftrace_call_adjust. Is there any obvious documentation that specifies
> > guarantees about ftrace_call_adjust being called exactly once for each site?
> > 
> 
> I don't see really much documentation on that function. As far as I can
> tell it is only called once for each site (and if it didn't, we'd always
> be placing the same instruction, but I agree it wouldn't be nice). It
> could depend on how far you can expand the notion of "adjusting" :) .

I've been thinking this over and I'm considering to make an ftrace_modify_code
with verify and warn_once if it fails. Then read the insn back and bug_on
should it not be the lr saver. Any objections?

> Steven, do you have an opinion on whether it would be acceptable to
> modify function entry code in ftrace_call_adjust() ?

Yes, Steve's vote first.

Torsten



Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-07 Thread Julien Thierry



On 06/02/2019 15:05, Torsten Duwe wrote:
> On Wed, Feb 06, 2019 at 08:59:44AM +, Julien Thierry wrote:
>> Hi Torsten,
>>
>> On 18/01/2019 16:39, Torsten Duwe wrote:
>>
>>> --- a/arch/arm64/kernel/ftrace.c
>>> +++ b/arch/arm64/kernel/ftrace.c
>>> @@ -133,17 +163,45 @@ int ftrace_make_call(struct dyn_ftrace *
>>> return ftrace_modify_code(pc, old, new, true);
>>>  }
>>>  
>>> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>>> +int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>>> +   unsigned long addr)
>>> +{
>>> +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
>>> +   u32 old, new;
>>> +
>>> +   old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
>>> +   new = aarch64_insn_gen_branch_imm(pc, addr, true);
>>> +
>>> +   return ftrace_modify_code(pc, old, new, true);
>>> +}
>>> +#endif
>>> +
>>>  /*
>>>   * Turn off the call to ftrace_caller() in instrumented function
>>>   */
>>>  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
>>> unsigned long addr)
>>>  {
>>> -   unsigned long pc = rec->ip;
>>> +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
>>
>> Sorry to come back on this patch again, but I was looking at the ftrace
>> code a bit, and I see that when processing the ftrace call locations,
>> ftrace calls ftrace_call_adjust() on every ip registered as mcount
>> caller (or in our case patchable entries). This ftrace_call_adjust() is
>> arch specific, so I was thinking we could place the offset in here once
>> and for all so we don't have to worry about it in the future.
> 
> Now that you mention it - yes indeed that's the correct facility to fix
> the deviating address, as Steve has also confirmed. I had totally forgotten
> about this hook.
> 
>> Also, I'm unsure whether it would be safe, but we could patch the "mov
>> x9, lr" there as well. In theory, this would be called at init time
>> (before secondary CPUs are brought up) and when loading a module (so I'd
>> expect no-one is executing that code *yet*.
>>
>> If this is possible, I think it would make things a bit cleaner.
> 
> This is in fact very tempting, but it will introduce a nasty side effect
> to ftrace_call_adjust. Is there any obvious documentation that specifies
> guarantees about ftrace_call_adjust being called exactly once for each site?
> 

I don't see really much documentation on that function. As far as I can
tell it is only called once for each site (and if it didn't, we'd always
be placing the same instruction, but I agree it wouldn't be nice). It
could depend on how far you can expand the notion of "adjusting" :) .

Steven, do you have an opinion on whether it would be acceptable to
modify function entry code in ftrace_call_adjust() ?

Thanks,

-- 
Julien Thierry


Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-06 Thread Torsten Duwe
On Wed, Feb 06, 2019 at 08:59:44AM +, Julien Thierry wrote:
> Hi Torsten,
> 
> On 18/01/2019 16:39, Torsten Duwe wrote:
> 
> > --- a/arch/arm64/kernel/ftrace.c
> > +++ b/arch/arm64/kernel/ftrace.c
> > @@ -133,17 +163,45 @@ int ftrace_make_call(struct dyn_ftrace *
> > return ftrace_modify_code(pc, old, new, true);
> >  }
> >  
> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> > +   unsigned long addr)
> > +{
> > +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> > +   u32 old, new;
> > +
> > +   old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
> > +   new = aarch64_insn_gen_branch_imm(pc, addr, true);
> > +
> > +   return ftrace_modify_code(pc, old, new, true);
> > +}
> > +#endif
> > +
> >  /*
> >   * Turn off the call to ftrace_caller() in instrumented function
> >   */
> >  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> > unsigned long addr)
> >  {
> > -   unsigned long pc = rec->ip;
> > +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> 
> Sorry to come back on this patch again, but I was looking at the ftrace
> code a bit, and I see that when processing the ftrace call locations,
> ftrace calls ftrace_call_adjust() on every ip registered as mcount
> caller (or in our case patchable entries). This ftrace_call_adjust() is
> arch specific, so I was thinking we could place the offset in here once
> and for all so we don't have to worry about it in the future.

Now that you mention it - yes indeed that's the correct facility to fix
the deviating address, as Steve has also confirmed. I had totally forgotten
about this hook.

> Also, I'm unsure whether it would be safe, but we could patch the "mov
> x9, lr" there as well. In theory, this would be called at init time
> (before secondary CPUs are brought up) and when loading a module (so I'd
> expect no-one is executing that code *yet*.
> 
> If this is possible, I think it would make things a bit cleaner.

This is in fact very tempting, but it will introduce a nasty side effect
to ftrace_call_adjust. Is there any obvious documentation that specifies
guarantees about ftrace_call_adjust being called exactly once for each site?

Torsten



Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-06 Thread Steven Rostedt
On Wed, 6 Feb 2019 08:59:44 +
Julien Thierry  wrote:

> Hi Torsten,
> 
> On 18/01/2019 16:39, Torsten Duwe wrote:
> > Once gcc8 adds 2 NOPs at the beginning of each function, replace the
> > first NOP thus generated with a quick LR saver (move it to scratch reg
> > x9), so the 2nd replacement insn, the call to ftrace, does not clobber
> > the value. Ftrace will then generate the standard stack frames.
> > 
> > Note that patchable-function-entry in GCC disables IPA-RA, which means
> > ABI register calling conventions are obeyed *and* scratch registers
> > such as x9 are available.
> > 
> > Introduce and handle an ftrace_regs_trampoline for module PLTs, right
> > after ftrace_trampoline, and double the size of this special section.
> > 
> > Signed-off-by: Torsten Duwe 
> > 
> > ---
> > 
> > Mark, if you see your ftrace entry macro code being represented correctly
> > here, please add your sign-off, As I've initially copied it from your mail.
> > 
> > ---
> >  arch/arm64/include/asm/ftrace.h  |   17 -
> >  arch/arm64/include/asm/module.h  |3 
> >  arch/arm64/kernel/entry-ftrace.S |  125 
> > +--
> >  arch/arm64/kernel/ftrace.c   |  114 ++-
> >  arch/arm64/kernel/module-plts.c  |3 
> >  arch/arm64/kernel/module.c   |2 
> >  6 files changed, 227 insertions(+), 37 deletions(-)  
> 
> [...]
> 
> > --- a/arch/arm64/kernel/ftrace.c
> > +++ b/arch/arm64/kernel/ftrace.c
> > @@ -133,17 +163,45 @@ int ftrace_make_call(struct dyn_ftrace *
> > return ftrace_modify_code(pc, old, new, true);
> >  }
> >  
> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> > +   unsigned long addr)
> > +{
> > +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> > +   u32 old, new;
> > +
> > +   old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
> > +   new = aarch64_insn_gen_branch_imm(pc, addr, true);
> > +
> > +   return ftrace_modify_code(pc, old, new, true);
> > +}
> > +#endif
> > +
> >  /*
> >   * Turn off the call to ftrace_caller() in instrumented function
> >   */
> >  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> > unsigned long addr)
> >  {
> > -   unsigned long pc = rec->ip;
> > +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;  
> 
> Sorry to come back on this patch again, but I was looking at the ftrace
> code a bit, and I see that when processing the ftrace call locations,
> ftrace calls ftrace_call_adjust() on every ip registered as mcount
> caller (or in our case patchable entries). This ftrace_call_adjust() is
> arch specific, so I was thinking we could place the offset in here once
> and for all so we don't have to worry about it in the future.

The ftrace_call_adjust() is there in case what is saved in the mcount
table is different than what is needed for the addresses. Which this
looks to be the case here.

-- Steve



Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-06 Thread Julien Thierry



On 06/02/2019 08:59, Julien Thierry wrote:
> Hi Torsten,
> 
> On 18/01/2019 16:39, Torsten Duwe wrote:
>> Once gcc8 adds 2 NOPs at the beginning of each function, replace the
>> first NOP thus generated with a quick LR saver (move it to scratch reg
>> x9), so the 2nd replacement insn, the call to ftrace, does not clobber
>> the value. Ftrace will then generate the standard stack frames.
>>
>> Note that patchable-function-entry in GCC disables IPA-RA, which means
>> ABI register calling conventions are obeyed *and* scratch registers
>> such as x9 are available.
>>
>> Introduce and handle an ftrace_regs_trampoline for module PLTs, right
>> after ftrace_trampoline, and double the size of this special section.
>>
>> Signed-off-by: Torsten Duwe 
>>
>> ---
>>
>> Mark, if you see your ftrace entry macro code being represented correctly
>> here, please add your sign-off, As I've initially copied it from your mail.
>>
>> ---
>>  arch/arm64/include/asm/ftrace.h  |   17 -
>>  arch/arm64/include/asm/module.h  |3 
>>  arch/arm64/kernel/entry-ftrace.S |  125 
>> +--
>>  arch/arm64/kernel/ftrace.c   |  114 ++-
>>  arch/arm64/kernel/module-plts.c  |3 
>>  arch/arm64/kernel/module.c   |2 
>>  6 files changed, 227 insertions(+), 37 deletions(-)
> 
> [...]
> 
>> --- a/arch/arm64/kernel/ftrace.c
>> +++ b/arch/arm64/kernel/ftrace.c
>> @@ -133,17 +163,45 @@ int ftrace_make_call(struct dyn_ftrace *
>>  return ftrace_modify_code(pc, old, new, true);
>>  }
>>  
>> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>> +int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>> +unsigned long addr)
>> +{
>> +unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
>> +u32 old, new;
>> +
>> +old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
>> +new = aarch64_insn_gen_branch_imm(pc, addr, true);
>> +
>> +return ftrace_modify_code(pc, old, new, true);
>> +}
>> +#endif
>> +
>>  /*
>>   * Turn off the call to ftrace_caller() in instrumented function
>>   */
>>  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
>>  unsigned long addr)
>>  {
>> -unsigned long pc = rec->ip;
>> +unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> 
> Sorry to come back on this patch again, but I was looking at the ftrace
> code a bit, and I see that when processing the ftrace call locations,
> ftrace calls ftrace_call_adjust() on every ip registered as mcount
> caller (or in our case patchable entries). This ftrace_call_adjust() is
> arch specific, so I was thinking we could place the offset in here once
> and for all so we don't have to worry about it in the future.
> 
> Also, I'm unsure whether it would be safe, but we could patch the "mov
> x9, lr" there as well. In theory, this would be called at init time
> (before secondary CPUs are brought up) and when loading a module (so I'd
> expect no-one is executing that code *yet*.
> 

And if the above works, we could also define CC_HAVE_NOP_MCOUNT (when we
have the patchable entry) and then we just not have to worry about the
initial ftrace_make_nop() with MCOUNT_ADDR.

-- 
Julien Thierry


Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-06 Thread Julien Thierry
Hi Torsten,

On 18/01/2019 16:39, Torsten Duwe wrote:
> Once gcc8 adds 2 NOPs at the beginning of each function, replace the
> first NOP thus generated with a quick LR saver (move it to scratch reg
> x9), so the 2nd replacement insn, the call to ftrace, does not clobber
> the value. Ftrace will then generate the standard stack frames.
> 
> Note that patchable-function-entry in GCC disables IPA-RA, which means
> ABI register calling conventions are obeyed *and* scratch registers
> such as x9 are available.
> 
> Introduce and handle an ftrace_regs_trampoline for module PLTs, right
> after ftrace_trampoline, and double the size of this special section.
> 
> Signed-off-by: Torsten Duwe 
> 
> ---
> 
> Mark, if you see your ftrace entry macro code being represented correctly
> here, please add your sign-off, As I've initially copied it from your mail.
> 
> ---
>  arch/arm64/include/asm/ftrace.h  |   17 -
>  arch/arm64/include/asm/module.h  |3 
>  arch/arm64/kernel/entry-ftrace.S |  125 
> +--
>  arch/arm64/kernel/ftrace.c   |  114 ++-
>  arch/arm64/kernel/module-plts.c  |3 
>  arch/arm64/kernel/module.c   |2 
>  6 files changed, 227 insertions(+), 37 deletions(-)

[...]

> --- a/arch/arm64/kernel/ftrace.c
> +++ b/arch/arm64/kernel/ftrace.c
> @@ -133,17 +163,45 @@ int ftrace_make_call(struct dyn_ftrace *
>   return ftrace_modify_code(pc, old, new, true);
>  }
>  
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> + unsigned long addr)
> +{
> + unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> + u32 old, new;
> +
> + old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
> + new = aarch64_insn_gen_branch_imm(pc, addr, true);
> +
> + return ftrace_modify_code(pc, old, new, true);
> +}
> +#endif
> +
>  /*
>   * Turn off the call to ftrace_caller() in instrumented function
>   */
>  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
>   unsigned long addr)
>  {
> - unsigned long pc = rec->ip;
> + unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;

Sorry to come back on this patch again, but I was looking at the ftrace
code a bit, and I see that when processing the ftrace call locations,
ftrace calls ftrace_call_adjust() on every ip registered as mcount
caller (or in our case patchable entries). This ftrace_call_adjust() is
arch specific, so I was thinking we could place the offset in here once
and for all so we don't have to worry about it in the future.

Also, I'm unsure whether it would be safe, but we could patch the "mov
x9, lr" there as well. In theory, this would be called at init time
(before secondary CPUs are brought up) and when loading a module (so I'd
expect no-one is executing that code *yet*.

If this is possible, I think it would make things a bit cleaner.

Let me know if that would work.

Thanks,

-- 
Julien Thierry


Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-04 Thread Ard Biesheuvel
On Mon, 4 Feb 2019 at 13:03, Torsten Duwe  wrote:
>
> On Tue, Jan 22, 2019 at 02:55:12PM +0100, Ard Biesheuvel wrote:
> > On Tue, 22 Jan 2019 at 14:28, Torsten Duwe  wrote:
> > >
> > > On Tue, Jan 22, 2019 at 10:18:17AM +, Julien Thierry wrote:
> > > > Hi Torsten,
> > > >
> > > > A few suggestions below.
> > > >
> > > > > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > > > > +#define ARCH_SUPPORTS_FTRACE_OPS 1
> > > > > +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> > > > > +/* All we need is some magic value. Simply use "_mCount:" */
> > > >
> > > > Nit: Should the casing be "_mcount" ?
> > >
> > > No! The string makes it clear what it's supposed to be and the peculiar
> > > casing makes it unique and leaves no doubt were it came from. So whenever
> > > you see this register value in a crash dump you don't have to wonder about
> > > weird linkage errors, as it surely did not originate from a symtab.
> > >
> >
> > In that case, do you need to deal with endianness here?
> >
> > > > > +#define MCOUNT_ADDR(0x5f6d436f756e743a)
>
> Strictly speaking, yes. OTOH "wrong-andian" machines always show a difference
> when memory is dumped in bigger units than bytes, so when you see the register
> value in hex...
>
> Since all that's needed is a somewhat unique constant, let's not over-engineer
> this ok?
>

Ah ok, if it is only for visual recognition, then I guess it doesn't matter.

> If there are no other comments I'll send out v8 with the discussed changes.
>
> Torsten
>


Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-04 Thread Torsten Duwe
On Tue, Jan 22, 2019 at 02:55:12PM +0100, Ard Biesheuvel wrote:
> On Tue, 22 Jan 2019 at 14:28, Torsten Duwe  wrote:
> >
> > On Tue, Jan 22, 2019 at 10:18:17AM +, Julien Thierry wrote:
> > > Hi Torsten,
> > >
> > > A few suggestions below.
> > >
> > > > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > > > +#define ARCH_SUPPORTS_FTRACE_OPS 1
> > > > +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> > > > +/* All we need is some magic value. Simply use "_mCount:" */
> > >
> > > Nit: Should the casing be "_mcount" ?
> >
> > No! The string makes it clear what it's supposed to be and the peculiar
> > casing makes it unique and leaves no doubt were it came from. So whenever
> > you see this register value in a crash dump you don't have to wonder about
> > weird linkage errors, as it surely did not originate from a symtab.
> >
> 
> In that case, do you need to deal with endianness here?
> 
> > > > +#define MCOUNT_ADDR(0x5f6d436f756e743a)

Strictly speaking, yes. OTOH "wrong-andian" machines always show a difference
when memory is dumped in bigger units than bytes, so when you see the register
value in hex...

Since all that's needed is a somewhat unique constant, let's not over-engineer
this ok?

If there are no other comments I'll send out v8 with the discussed changes.

Torsten



Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-01-23 Thread Singh, Balbir



On 1/23/19 2:09 AM, Torsten Duwe wrote:
> Hi Balbir!
> 

Hi, Torsten!

> On Tue, Jan 22, 2019 at 02:39:32PM +1300, Singh, Balbir wrote:
>>
>> On 1/19/19 5:39 AM, Torsten Duwe wrote:
>>> + */
>>> +ftrace_common_return:
>>> +   /* restore function args */
>>> +   ldp x0, x1, [sp]
>>> +   ldp x2, x3, [sp, #S_X2]
>>> +   ldp x4, x5, [sp, #S_X4]
>>> +   ldp x6, x7, [sp, #S_X6]
>>> +   ldr x8, [sp, #S_X8]
>>> +
>>> +   /* restore fp and x28 */
>>> +   ldp x28, x29, [sp, #S_X28]
>>> +
>>> +   ldr lr, [sp, #S_LR]
>>> +   ldr x9, [sp, #S_PC]
>>
>> Is it fair to assume that we never modify registers beyond LR and PC as a 
>> result of ftrace/livepatching? I presume it is, but just checking.
> 
> These are either callee-save or scratch. Whatever is called, ftrace framework
> functions or replacement functions, must preserve the callee-saved regs; and
> the caller, who made a function call (sic!-) saves caller-saved and marks the
> rest dead on return. So it's the arguments that matter after all.
> 
> As you can see, disabling IPA-RA is cruicial here.
> 
> Or are you talking about deliberate argument manipulation?

I wonder if another use of ftrace via register_ftrace_ops can expect to modify 
arguments, like we modify the PC and RA

> 
>>> +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
>>> +   u32 old, new;
>>> +
>>> +   old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
>>> +   new = aarch64_insn_gen_branch_imm(pc, addr, true);
>>> +
>>
>> Is this a branch or a call? Does addr always fit in the immediate limits?
> 
> As Julien has now pointed out, the correct enum value AARCH64_INSN_BRANCH_LINK
> should clarify this. It will surely fit for the kernel proper, and the modules
> are handled with the trampolines.

OK, so there is an assumption of the size of vmlinux being in a certain range? 
I suspect something like a allyesconfig with debug might be a worthy challenger 
:) Yes, modules do get trampolines.

> 
>>> +   return ftrace_modify_code(pc, old, new, true);
>>
>> Can you talk to the semantics of whether this operation is atomic w.r.t 
>> system? Will old and new return consistent values? Given the nature of 
>> ftrace, I presume it's well isolated. 
> 
> aarch64_insn_patch_text_nosync() does a __flush_icache_range() on success.
> Mark wrote that this is already sufficient IIRC. (I had memory barriers
> there, when I was still trying to modify 2 insns every time).
> 
>>
>>> +   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) &&
>>> +   addr == MCOUNT_ADDR) {
>>> +   old = aarch64_insn_gen_nop();
>>> +   new = MOV_X9_X30;
>>> +   pc -= REC_IP_BRANCH_OFFSET;
>>> +   return ftrace_modify_code(pc, old, new, validate);
>>
>> I presume all the icache flush and barrier handling is in 
>> ftrace_modify_code()?
> 
> Yes, see above.
> 

Thanks!

>>> +   }
>>> +
>>> if (offset < -SZ_128M || offset >= SZ_128M) {
>>>  #ifdef CONFIG_ARM64_MODULE_PLTS
>>> u32 replaced;
>>> --- a/arch/arm64/include/asm/module.h
>>> +++ b/arch/arm64/include/asm/module.h
>>> @@ -32,7 +32,8 @@ struct mod_arch_specific {
>>> struct mod_plt_sec  init;
>>>  
>>> /* for CONFIG_DYNAMIC_FTRACE */
>>> -   struct plt_entry*ftrace_trampoline;
>>> +   struct plt_entry*ftrace_trampolines;
>>> +#define MOD_ARCH_NR_FTRACE_TRAMPOLINES 2
>>
>> I don't see the generation of ftrace_trampolines[1]
>>
> 
> That was further up, install_ftrace_trampoline() in kernel/ftrace.c.
> 

Thanks!

> +   if (*addr == FTRACE_ADDR)
> +   mod_trampoline = >arch.ftrace_trampolines[0];
> +   else if (*addr == FTRACE_REGS_ADDR)
> +   mod_trampoline = >arch.ftrace_trampolines[1];
> [...]
> +   trampoline = get_plt_entry(*addr, mod_trampoline);
> +
> +   if (!plt_entries_equal(mod_trampoline, )) {
> [...]
> 
> get_plt_entry() generates a small bunch of instructions that easily
> fit into the argument registers. Compare commit bdb85cd1d20669dfae8
> for the new trampoline insns.
> 
> Hope I've covered all your concerns,
> 

Yes, largely thank you

Balbir


Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-01-22 Thread Ard Biesheuvel
On Tue, 22 Jan 2019 at 14:28, Torsten Duwe  wrote:
>
> On Tue, Jan 22, 2019 at 10:18:17AM +, Julien Thierry wrote:
> > Hi Torsten,
> >
> > A few suggestions below.
> >
> > > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > > +#define ARCH_SUPPORTS_FTRACE_OPS 1
> > > +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> > > +/* All we need is some magic value. Simply use "_mCount:" */
> >
> > Nit: Should the casing be "_mcount" ?
>
> No! The string makes it clear what it's supposed to be and the peculiar
> casing makes it unique and leaves no doubt were it came from. So whenever
> you see this register value in a crash dump you don't have to wonder about
> weird linkage errors, as it surely did not originate from a symtab.
>

In that case, do you need to deal with endianness here?

> > > +#define MCOUNT_ADDR(0x5f6d436f756e743a)
> > > +#else
> > > +#define REC_IP_BRANCH_OFFSET 0
> > > +#define MCOUNT_ADDR((unsigned long)_mcount)
> > > +#endif
> > > +
> > >  #ifndef __ASSEMBLY__
> > >  #include 
> > >
> > > --- a/arch/arm64/kernel/entry-ftrace.S
> > > +++ b/arch/arm64/kernel/entry-ftrace.S
> > > @@ -10,6 +10,7 @@
> > >   */
> > >
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> ^^^
> > > +
> > > +ENTRY(ftrace_common)
> > > +
> > > +   mov x3, sp  /* pt_regs are @sp */
> > > +   ldr_l   x2, function_trace_op, x0
> > > +   mov x1, x9  /* parent IP */
> > > +   sub x0, lr, #8  /* function entry == IP */
> >
> > The #8 is because we go back two instructions right? Can we use
> > #(AARCH64_INSN_SIZE * 2) instead?
>
> Hmm,  was already included, so why not. (same below)
>
> > > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > > +int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> > > +   unsigned long addr)
> > > +{
> > > +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> > > +   u32 old, new;
> > > +
> > > +   old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
> > > +   new = aarch64_insn_gen_branch_imm(pc, addr, true);
> >
> > The last argument of aarch64_insn_gen_branch_imm() is an enum, not a
> > boolean.
> >
> > You should use AARCH64_INSN_BRANCH_LINK here which happens to be equal to 1.
>
> That's what you get when you keep copying code...
> Good catch, thanks!
>
> > > +*  initially; the NOPs are already there. So instead,
> > > +*  put the LR saver there ahead of time, in order to avoid
> > > +*  any race condition over patching 2 instructions.
> > > +*/
> > > +   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) &&
> > > +   addr == MCOUNT_ADDR) {
> >
> > This works, however it feels a bit weird since core code asked to
> > generate a NOP but not only we don't generate it but we put another
> > instruction instead.
>
> It's not the first time weird x86 sets the standards and all others
> need workarounds. But here it just came handy to abuse this call :-)
>
> > I think it would be useful to state that the replacement of mcount calls
> > by nops is only ever done once at system initialization.
> >
> > Or maybe have a intermediate function:
> >
> > static int ftrace_setup_patchable_entry(unsigned long addr)
> > {
> >   u32 old, new;
> >
> >   old = aarch64_insn_gen_nop();
> >   new = MOV_X9_X30;
> >   pc -= REC_IP_BRANCH_OFFSET;
> >   return ftrace_modify_code(pc, old, new, validate);
> > }
> >
> >   [...]
> >
> >   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) &&
> >   addr == MCOUNT_ADDR)
> >   return ftrace_setup_patchable_entry(pc);
> >
> >
> > This way it clearly show that this is a setup/init corner case.
>
> I'll definitely consider this.
>
> Thanks for the input,
>
> Torsten
>
>


Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-01-22 Thread Julien Thierry



On 22/01/2019 13:28, Torsten Duwe wrote:
> On Tue, Jan 22, 2019 at 10:18:17AM +, Julien Thierry wrote:
>> Hi Torsten,
>>
>> A few suggestions below.
>>
>>> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>>> +#define ARCH_SUPPORTS_FTRACE_OPS 1
>>> +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
>>> +/* All we need is some magic value. Simply use "_mCount:" */
>>
>> Nit: Should the casing be "_mcount" ?
> 
> No! The string makes it clear what it's supposed to be and the peculiar
> casing makes it unique and leaves no doubt were it came from. So whenever
> you see this register value in a crash dump you don't have to wonder about
> weird linkage errors, as it surely did not originate from a symtab.
> 

Right, I had missed the point that the value below is actually the
binary representation of that string. Things make more sense now, thanks.

>>> +#define MCOUNT_ADDR(0x5f6d436f756e743a)
>>> +#else
>>> +#define REC_IP_BRANCH_OFFSET 0
>>> +#define MCOUNT_ADDR((unsigned long)_mcount)
>>> +#endif
>>> +


-- 
Julien Thierry


Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-01-22 Thread Torsten Duwe
On Tue, Jan 22, 2019 at 10:18:17AM +, Julien Thierry wrote:
> Hi Torsten,
> 
> A few suggestions below.
> 
> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +#define ARCH_SUPPORTS_FTRACE_OPS 1
> > +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> > +/* All we need is some magic value. Simply use "_mCount:" */
> 
> Nit: Should the casing be "_mcount" ?

No! The string makes it clear what it's supposed to be and the peculiar
casing makes it unique and leaves no doubt were it came from. So whenever
you see this register value in a crash dump you don't have to wonder about
weird linkage errors, as it surely did not originate from a symtab.

> > +#define MCOUNT_ADDR(0x5f6d436f756e743a)
> > +#else
> > +#define REC_IP_BRANCH_OFFSET 0
> > +#define MCOUNT_ADDR((unsigned long)_mcount)
> > +#endif
> > +
> >  #ifndef __ASSEMBLY__
> >  #include 
> >  
> > --- a/arch/arm64/kernel/entry-ftrace.S
> > +++ b/arch/arm64/kernel/entry-ftrace.S
> > @@ -10,6 +10,7 @@
> >   */
> >  
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
^^^
> > +
> > +ENTRY(ftrace_common)
> > +
> > +   mov x3, sp  /* pt_regs are @sp */
> > +   ldr_l   x2, function_trace_op, x0
> > +   mov x1, x9  /* parent IP */
> > +   sub x0, lr, #8  /* function entry == IP */
> 
> The #8 is because we go back two instructions right? Can we use
> #(AARCH64_INSN_SIZE * 2) instead?

Hmm,  was already included, so why not. (same below)

> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> > +   unsigned long addr)
> > +{
> > +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> > +   u32 old, new;
> > +
> > +   old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
> > +   new = aarch64_insn_gen_branch_imm(pc, addr, true);
> 
> The last argument of aarch64_insn_gen_branch_imm() is an enum, not a
> boolean.
> 
> You should use AARCH64_INSN_BRANCH_LINK here which happens to be equal to 1.

That's what you get when you keep copying code...
Good catch, thanks!

> > +*  initially; the NOPs are already there. So instead,
> > +*  put the LR saver there ahead of time, in order to avoid
> > +*  any race condition over patching 2 instructions.
> > +*/
> > +   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) &&
> > +   addr == MCOUNT_ADDR) {
> 
> This works, however it feels a bit weird since core code asked to
> generate a NOP but not only we don't generate it but we put another
> instruction instead.

It's not the first time weird x86 sets the standards and all others
need workarounds. But here it just came handy to abuse this call :-)

> I think it would be useful to state that the replacement of mcount calls
> by nops is only ever done once at system initialization.
> 
> Or maybe have a intermediate function:
> 
> static int ftrace_setup_patchable_entry(unsigned long addr)
> {
>   u32 old, new;
> 
>   old = aarch64_insn_gen_nop();
>   new = MOV_X9_X30;
>   pc -= REC_IP_BRANCH_OFFSET;
>   return ftrace_modify_code(pc, old, new, validate);
> }
> 
>   [...]
> 
>   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) &&
>   addr == MCOUNT_ADDR)
>   return ftrace_setup_patchable_entry(pc);
> 
> 
> This way it clearly show that this is a setup/init corner case.

I'll definitely consider this.

Thanks for the input,

Torsten




Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-01-22 Thread Torsten Duwe
Hi Balbir!

On Tue, Jan 22, 2019 at 02:39:32PM +1300, Singh, Balbir wrote:
> 
> On 1/19/19 5:39 AM, Torsten Duwe wrote:
> > + */
> > +ftrace_common_return:
> > +   /* restore function args */
> > +   ldp x0, x1, [sp]
> > +   ldp x2, x3, [sp, #S_X2]
> > +   ldp x4, x5, [sp, #S_X4]
> > +   ldp x6, x7, [sp, #S_X6]
> > +   ldr x8, [sp, #S_X8]
> > +
> > +   /* restore fp and x28 */
> > +   ldp x28, x29, [sp, #S_X28]
> > +
> > +   ldr lr, [sp, #S_LR]
> > +   ldr x9, [sp, #S_PC]
> 
> Is it fair to assume that we never modify registers beyond LR and PC as a 
> result of ftrace/livepatching? I presume it is, but just checking.

These are either callee-save or scratch. Whatever is called, ftrace framework
functions or replacement functions, must preserve the callee-saved regs; and
the caller, who made a function call (sic!-) saves caller-saved and marks the
rest dead on return. So it's the arguments that matter after all.

As you can see, disabling IPA-RA is cruicial here.

Or are you talking about deliberate argument manipulation?

> > +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> > +   u32 old, new;
> > +
> > +   old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
> > +   new = aarch64_insn_gen_branch_imm(pc, addr, true);
> > +
> 
> Is this a branch or a call? Does addr always fit in the immediate limits?

As Julien has now pointed out, the correct enum value AARCH64_INSN_BRANCH_LINK
should clarify this. It will surely fit for the kernel proper, and the modules
are handled with the trampolines.

> > +   return ftrace_modify_code(pc, old, new, true);
> 
> Can you talk to the semantics of whether this operation is atomic w.r.t 
> system? Will old and new return consistent values? Given the nature of 
> ftrace, I presume it's well isolated. 

aarch64_insn_patch_text_nosync() does a __flush_icache_range() on success.
Mark wrote that this is already sufficient IIRC. (I had memory barriers
there, when I was still trying to modify 2 insns every time).

> 
> > +   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) &&
> > +   addr == MCOUNT_ADDR) {
> > +   old = aarch64_insn_gen_nop();
> > +   new = MOV_X9_X30;
> > +   pc -= REC_IP_BRANCH_OFFSET;
> > +   return ftrace_modify_code(pc, old, new, validate);
> 
> I presume all the icache flush and barrier handling is in 
> ftrace_modify_code()?

Yes, see above.

> > +   }
> > +
> > if (offset < -SZ_128M || offset >= SZ_128M) {
> >  #ifdef CONFIG_ARM64_MODULE_PLTS
> > u32 replaced;
> > --- a/arch/arm64/include/asm/module.h
> > +++ b/arch/arm64/include/asm/module.h
> > @@ -32,7 +32,8 @@ struct mod_arch_specific {
> > struct mod_plt_sec  init;
> >  
> > /* for CONFIG_DYNAMIC_FTRACE */
> > -   struct plt_entry*ftrace_trampoline;
> > +   struct plt_entry*ftrace_trampolines;
> > +#define MOD_ARCH_NR_FTRACE_TRAMPOLINES 2
> 
> I don't see the generation of ftrace_trampolines[1]
> 

That was further up, install_ftrace_trampoline() in kernel/ftrace.c.

+   if (*addr == FTRACE_ADDR)
+   mod_trampoline = >arch.ftrace_trampolines[0];
+   else if (*addr == FTRACE_REGS_ADDR)
+   mod_trampoline = >arch.ftrace_trampolines[1];
[...]
+   trampoline = get_plt_entry(*addr, mod_trampoline);
+
+   if (!plt_entries_equal(mod_trampoline, )) {
[...]

get_plt_entry() generates a small bunch of instructions that easily
fit into the argument registers. Compare commit bdb85cd1d20669dfae8
for the new trampoline insns.

Hope I've covered all your concerns,

Torsten



Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-01-22 Thread Julien Thierry
Hi Torsten,

A few suggestions below.

On 18/01/2019 16:39, Torsten Duwe wrote:
> Once gcc8 adds 2 NOPs at the beginning of each function, replace the
> first NOP thus generated with a quick LR saver (move it to scratch reg
> x9), so the 2nd replacement insn, the call to ftrace, does not clobber
> the value. Ftrace will then generate the standard stack frames.
> 
> Note that patchable-function-entry in GCC disables IPA-RA, which means
> ABI register calling conventions are obeyed *and* scratch registers
> such as x9 are available.
> 
> Introduce and handle an ftrace_regs_trampoline for module PLTs, right
> after ftrace_trampoline, and double the size of this special section.
> 
> Signed-off-by: Torsten Duwe 
> 
> ---
> 
> Mark, if you see your ftrace entry macro code being represented correctly
> here, please add your sign-off, As I've initially copied it from your mail.
> 
> ---
>  arch/arm64/include/asm/ftrace.h  |   17 -
>  arch/arm64/include/asm/module.h  |3 
>  arch/arm64/kernel/entry-ftrace.S |  125 
> +--
>  arch/arm64/kernel/ftrace.c   |  114 ++-
>  arch/arm64/kernel/module-plts.c  |3 
>  arch/arm64/kernel/module.c   |2 
>  6 files changed, 227 insertions(+), 37 deletions(-)
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -14,9 +14,24 @@
>  #include 
>  
>  #define HAVE_FUNCTION_GRAPH_FP_TEST
> -#define MCOUNT_ADDR  ((unsigned long)_mcount)
>  #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
>  
> +/*
> + * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
> + * of each function, with the second NOP actually calling ftrace. In contrary
> + * to a classic _mcount call, the call instruction to be modified is thus
> + * the second one, and not the only one.
> + */
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +#define ARCH_SUPPORTS_FTRACE_OPS 1
> +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> +/* All we need is some magic value. Simply use "_mCount:" */

Nit: Should the casing be "_mcount" ?

> +#define MCOUNT_ADDR  (0x5f6d436f756e743a)
> +#else
> +#define REC_IP_BRANCH_OFFSET 0
> +#define MCOUNT_ADDR  ((unsigned long)_mcount)
> +#endif
> +
>  #ifndef __ASSEMBLY__
>  #include 
>  
> --- a/arch/arm64/kernel/entry-ftrace.S
> +++ b/arch/arm64/kernel/entry-ftrace.S
> @@ -10,6 +10,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -124,6 +125,7 @@ EXPORT_SYMBOL(_mcount)
>  NOKPROBE(_mcount)
>  
>  #else /* CONFIG_DYNAMIC_FTRACE */
> +#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>  /*
>   * _mcount() is used to build the kernel with -pg option, but all the branch
>   * instructions to _mcount() are replaced to NOP initially at kernel start 
> up,
> @@ -163,11 +165,6 @@ GLOBAL(ftrace_graph_call)// ftrace_gra
>  
>   mcount_exit
>  ENDPROC(ftrace_caller)
> -#endif /* CONFIG_DYNAMIC_FTRACE */
> -
> -ENTRY(ftrace_stub)
> - ret
> -ENDPROC(ftrace_stub)
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>  /*
> @@ -187,7 +184,125 @@ ENTRY(ftrace_graph_caller)
>  
>   mcount_exit
>  ENDPROC(ftrace_graph_caller)
> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> +
> +#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> +
> + .macro  ftrace_regs_entry, allregs=0
> + /* make room for pt_regs, plus a callee frame */
> + sub sp, sp, #(S_FRAME_SIZE + 16)
> +
> + /* save function arguments */
> + stp x0, x1, [sp, #S_X0]
> + stp x2, x3, [sp, #S_X2]
> + stp x4, x5, [sp, #S_X4]
> + stp x6, x7, [sp, #S_X6]
> + stp x8, x9, [sp, #S_X8]
>  
> + .if \allregs == 1
> + stp x10, x11, [sp, #S_X10]
> + stp x12, x13, [sp, #S_X12]
> + stp x14, x15, [sp, #S_X14]
> + stp x16, x17, [sp, #S_X16]
> + stp x18, x19, [sp, #S_X18]
> + stp x20, x21, [sp, #S_X20]
> + stp x22, x23, [sp, #S_X22]
> + stp x24, x25, [sp, #S_X24]
> + stp x26, x27, [sp, #S_X26]
> + .endif
> +
> + /* Save fp and x28, which is used in this function. */
> + stp x28, x29, [sp, #S_X28]
> +
> + /* The stack pointer as it was on ftrace_caller entry... */
> + add x28, sp, #(S_FRAME_SIZE + 16)
> + /* ...and the link Register at callee entry */
> + stp x9, x28, [sp, #S_LR]/* to pt_regs.r[30] and .sp */
> +
> + /* The program counter just after the ftrace call site */
> + str lr, [sp, #S_PC]
> +
> + /* Now fill in callee's preliminary stackframe. */
> + stp x29, x9, [sp, #S_FRAME_SIZE]
> + /* Let FP point to it. */
> + add x29, sp, #S_FRAME_SIZE
> +
> + /* Our stackframe, stored inside pt_regs. */
> + stp x29, x30, [sp, #S_STACKFRAME]
> + add x29, sp, #S_STACKFRAME
> + .endm
> +
> +ENTRY(ftrace_regs_caller)
> + ftrace_regs_entry   1
> + b   ftrace_common
> +ENDPROC(ftrace_regs_caller)
> +
> +ENTRY(ftrace_caller)
> +

Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-01-21 Thread Singh, Balbir



On 1/19/19 5:39 AM, Torsten Duwe wrote:
> Once gcc8 adds 2 NOPs at the beginning of each function, replace the
> first NOP thus generated with a quick LR saver (move it to scratch reg
> x9), so the 2nd replacement insn, the call to ftrace, does not clobber
> the value. Ftrace will then generate the standard stack frames.
> 
> Note that patchable-function-entry in GCC disables IPA-RA, which means
> ABI register calling conventions are obeyed *and* scratch registers
> such as x9 are available.
> 
> Introduce and handle an ftrace_regs_trampoline for module PLTs, right
> after ftrace_trampoline, and double the size of this special section.
> 
> Signed-off-by: Torsten Duwe 
> 
> ---
> 
> Mark, if you see your ftrace entry macro code being represented correctly
> here, please add your sign-off, As I've initially copied it from your mail.
> 
> ---
>  arch/arm64/include/asm/ftrace.h  |   17 -
>  arch/arm64/include/asm/module.h  |3 
>  arch/arm64/kernel/entry-ftrace.S |  125 
> +--
>  arch/arm64/kernel/ftrace.c   |  114 ++-
>  arch/arm64/kernel/module-plts.c  |3 
>  arch/arm64/kernel/module.c   |2 
>  6 files changed, 227 insertions(+), 37 deletions(-)
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -14,9 +14,24 @@
>  #include 
>  
>  #define HAVE_FUNCTION_GRAPH_FP_TEST
> -#define MCOUNT_ADDR  ((unsigned long)_mcount)
>  #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
>  
> +/*
> + * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
> + * of each function, with the second NOP actually calling ftrace. In contrary
> + * to a classic _mcount call, the call instruction to be modified is thus
> + * the second one, and not the only one.
> + */
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +#define ARCH_SUPPORTS_FTRACE_OPS 1
> +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> +/* All we need is some magic value. Simply use "_mCount:" */
> +#define MCOUNT_ADDR  (0x5f6d436f756e743a)
> +#else
> +#define REC_IP_BRANCH_OFFSET 0
> +#define MCOUNT_ADDR  ((unsigned long)_mcount)
> +#endif
> +
>  #ifndef __ASSEMBLY__
>  #include 
>  
> --- a/arch/arm64/kernel/entry-ftrace.S
> +++ b/arch/arm64/kernel/entry-ftrace.S
> @@ -10,6 +10,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -124,6 +125,7 @@ EXPORT_SYMBOL(_mcount)
>  NOKPROBE(_mcount)
>  
>  #else /* CONFIG_DYNAMIC_FTRACE */
> +#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>  /*
>   * _mcount() is used to build the kernel with -pg option, but all the branch
>   * instructions to _mcount() are replaced to NOP initially at kernel start 
> up,
> @@ -163,11 +165,6 @@ GLOBAL(ftrace_graph_call)// ftrace_gra
>  
>   mcount_exit
>  ENDPROC(ftrace_caller)
> -#endif /* CONFIG_DYNAMIC_FTRACE */
> -
> -ENTRY(ftrace_stub)
> - ret
> -ENDPROC(ftrace_stub)
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>  /*
> @@ -187,7 +184,125 @@ ENTRY(ftrace_graph_caller)
>  
>   mcount_exit
>  ENDPROC(ftrace_graph_caller)
> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> +
> +#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> +
> + .macro  ftrace_regs_entry, allregs=0
> + /* make room for pt_regs, plus a callee frame */
> + sub sp, sp, #(S_FRAME_SIZE + 16)
> +
> + /* save function arguments */
> + stp x0, x1, [sp, #S_X0]
> + stp x2, x3, [sp, #S_X2]
> + stp x4, x5, [sp, #S_X4]
> + stp x6, x7, [sp, #S_X6]
> + stp x8, x9, [sp, #S_X8]
>  
> + .if \allregs == 1
> + stp x10, x11, [sp, #S_X10]
> + stp x12, x13, [sp, #S_X12]
> + stp x14, x15, [sp, #S_X14]
> + stp x16, x17, [sp, #S_X16]
> + stp x18, x19, [sp, #S_X18]
> + stp x20, x21, [sp, #S_X20]
> + stp x22, x23, [sp, #S_X22]
> + stp x24, x25, [sp, #S_X24]
> + stp x26, x27, [sp, #S_X26]
> + .endif
> +
> + /* Save fp and x28, which is used in this function. */
> + stp x28, x29, [sp, #S_X28]
> +
> + /* The stack pointer as it was on ftrace_caller entry... */
> + add x28, sp, #(S_FRAME_SIZE + 16)   
> + /* ...and the link Register at callee entry */
> + stp x9, x28, [sp, #S_LR]/* to pt_regs.r[30] and .sp */
> +
> + /* The program counter just after the ftrace call site */
> + str lr, [sp, #S_PC]
> +
> + /* Now fill in callee's preliminary stackframe. */
> + stp x29, x9, [sp, #S_FRAME_SIZE]
> + /* Let FP point to it. */
> + add x29, sp, #S_FRAME_SIZE
> +
> + /* Our stackframe, stored inside pt_regs. */
> + stp x29, x30, [sp, #S_STACKFRAME]
> + add x29, sp, #S_STACKFRAME
> + .endm
> +
> +ENTRY(ftrace_regs_caller)
> + ftrace_regs_entry   1
> + b   ftrace_common
> +ENDPROC(ftrace_regs_caller)
> +
> +ENTRY(ftrace_caller)
> + ftrace_regs_entry   0
> + b   ftrace_common
> 

[PATCH v7 2/3] arm64: implement ftrace with regs

2019-01-18 Thread Torsten Duwe
Once gcc8 adds 2 NOPs at the beginning of each function, replace the
first NOP thus generated with a quick LR saver (move it to scratch reg
x9), so the 2nd replacement insn, the call to ftrace, does not clobber
the value. Ftrace will then generate the standard stack frames.

Note that patchable-function-entry in GCC disables IPA-RA, which means
ABI register calling conventions are obeyed *and* scratch registers
such as x9 are available.

Introduce and handle an ftrace_regs_trampoline for module PLTs, right
after ftrace_trampoline, and double the size of this special section.

Signed-off-by: Torsten Duwe 

---

Mark, if you see your ftrace entry macro code being represented correctly
here, please add your sign-off, As I've initially copied it from your mail.

---
 arch/arm64/include/asm/ftrace.h  |   17 -
 arch/arm64/include/asm/module.h  |3 
 arch/arm64/kernel/entry-ftrace.S |  125 +--
 arch/arm64/kernel/ftrace.c   |  114 ++-
 arch/arm64/kernel/module-plts.c  |3 
 arch/arm64/kernel/module.c   |2 
 6 files changed, 227 insertions(+), 37 deletions(-)
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -14,9 +14,24 @@
 #include 
 
 #define HAVE_FUNCTION_GRAPH_FP_TEST
-#define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+/*
+ * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
+ * of each function, with the second NOP actually calling ftrace. In contrary
+ * to a classic _mcount call, the call instruction to be modified is thus
+ * the second one, and not the only one.
+ */
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
+/* All we need is some magic value. Simply use "_mCount:" */
+#define MCOUNT_ADDR(0x5f6d436f756e743a)
+#else
+#define REC_IP_BRANCH_OFFSET 0
+#define MCOUNT_ADDR((unsigned long)_mcount)
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -10,6 +10,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -124,6 +125,7 @@ EXPORT_SYMBOL(_mcount)
 NOKPROBE(_mcount)
 
 #else /* CONFIG_DYNAMIC_FTRACE */
+#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 /*
  * _mcount() is used to build the kernel with -pg option, but all the branch
  * instructions to _mcount() are replaced to NOP initially at kernel start up,
@@ -163,11 +165,6 @@ GLOBAL(ftrace_graph_call)  // ftrace_gra
 
mcount_exit
 ENDPROC(ftrace_caller)
-#endif /* CONFIG_DYNAMIC_FTRACE */
-
-ENTRY(ftrace_stub)
-   ret
-ENDPROC(ftrace_stub)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 /*
@@ -187,7 +184,125 @@ ENTRY(ftrace_graph_caller)
 
mcount_exit
 ENDPROC(ftrace_graph_caller)
+#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
+#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
+
+   .macro  ftrace_regs_entry, allregs=0
+   /* make room for pt_regs, plus a callee frame */
+   sub sp, sp, #(S_FRAME_SIZE + 16)
+
+   /* save function arguments */
+   stp x0, x1, [sp, #S_X0]
+   stp x2, x3, [sp, #S_X2]
+   stp x4, x5, [sp, #S_X4]
+   stp x6, x7, [sp, #S_X6]
+   stp x8, x9, [sp, #S_X8]
 
+   .if \allregs == 1
+   stp x10, x11, [sp, #S_X10]
+   stp x12, x13, [sp, #S_X12]
+   stp x14, x15, [sp, #S_X14]
+   stp x16, x17, [sp, #S_X16]
+   stp x18, x19, [sp, #S_X18]
+   stp x20, x21, [sp, #S_X20]
+   stp x22, x23, [sp, #S_X22]
+   stp x24, x25, [sp, #S_X24]
+   stp x26, x27, [sp, #S_X26]
+   .endif
+
+   /* Save fp and x28, which is used in this function. */
+   stp x28, x29, [sp, #S_X28]
+
+   /* The stack pointer as it was on ftrace_caller entry... */
+   add x28, sp, #(S_FRAME_SIZE + 16)
+   /* ...and the link Register at callee entry */
+   stp x9, x28, [sp, #S_LR]/* to pt_regs.r[30] and .sp */
+
+   /* The program counter just after the ftrace call site */
+   str lr, [sp, #S_PC]
+
+   /* Now fill in callee's preliminary stackframe. */
+   stp x29, x9, [sp, #S_FRAME_SIZE]
+   /* Let FP point to it. */
+   add x29, sp, #S_FRAME_SIZE
+
+   /* Our stackframe, stored inside pt_regs. */
+   stp x29, x30, [sp, #S_STACKFRAME]
+   add x29, sp, #S_STACKFRAME
+   .endm
+
+ENTRY(ftrace_regs_caller)
+   ftrace_regs_entry   1
+   b   ftrace_common
+ENDPROC(ftrace_regs_caller)
+
+ENTRY(ftrace_caller)
+   ftrace_regs_entry   0
+   b   ftrace_common
+ENDPROC(ftrace_caller)
+
+ENTRY(ftrace_common)
+
+   mov x3, sp  /* pt_regs are @sp */
+   ldr_l   x2, function_trace_op, x0
+   mov x1, x9  /* parent IP */
+   sub x0, lr, #8  /* function entry == IP */
+