Re: drivers/char/random.c needs a (new) maintainer

2020-12-23 Thread Torsten Duwe
On Fri, 18 Dec 2020 10:25:19 -0300
Marcelo Henrique Cerri  wrote:

> Hi, Ted and Jason.
> 
> Any updates on that?
> 
> I don't believe Torsten's concerns are simply about *applying* patches
> but more about these long periods of radio silence. That kills

Exactly. I could live with replies in the style of "old" Linus like:
"Your code is crap, because it does X and Y". Then I knew how to
proceed. But this extended silence slows things down a lot.

> collaboration and disengage people. More than simply reviewing patches
> I would expect a maintainer to give directions and drive the
> community. Asking Jason to review Nicolai's patches was a step towards
> that, but I believe we still could benefit from better communication.

Even regarding this I'm not so sure it was a good idea. Jason seems to
narrow the proposed changes down to "FIPS certification", when it
actually is a lot more. I think his motivation suffers because of his
personal dislike.

> Besides Nicolai's RFC, are you also planning to take another look at
> Stephan's patches?

Yes, please advise! For important, major changes the maintainer should
ping the contributors, not vice versa. Not even to mention the bunch of
minor changes pending, some even acked by independent developers.

Torsten


drivers/char/random.c needs a (new) maintainer

2020-11-30 Thread Torsten Duwe
Hi Linus!

AFAIK it's legit to bother you directly with issues like this one?

I see certifications as the mere messengers here which tell us that
our /dev/random is technologically outdated. Input entropy amounts are
guesstimated in advance, obviously much too conservatively, compiled in
and never checked thereafter; the whitening is done using some home-
grown hash function derivative and other non-cryptographic, non-standard
operations.

All of this does not affect the Linux kernel directly, it will compile
happily, and will run smoothly with all given crypto apps. Only new
crypto keys are generated slower than necessary or, much worse, might
contain less entropy than required because something broke down
unnoticed.  In that case, problems would arise only much later, but in
the real world and with much graver impact. I would rather like to see
the Linux /dev/random being reliable, whether certified or not. If it
provided that reliable entropy fast that would be even cooler. If it
was at least possible to get approval from a standardization body
(without forcing this onto all users, of course) that would be optimal.

Meanwhile there's quite a maintenance backlog; minor fixes are
pending, medium-sized cleanups are ignored and major patch sets to add
the missing features are not even discussed. (I'm deliberately not
including links here to avoid excessive finger pointing.)

I'd like to believe that Ted is too busy working on ext4, but,
especially on explicit request, a "hold on, I'm busy, will get at it
later" or "right, someone wants to take over?" would be appropriate
IMHO. It is also not helpful to object to or ignore all changes which
might benefit certifications just for that sole reason and because of
personal aversion. No reply at all yields exactly the same result as
having no maintainer at all, hence the subject.

Could you please try to get a definite answer from him? I know there
is at least one person (probably more) with enough enthusiasm and
expertise who would happily take over, should that turn out to be a
problem.

Thanks,

Torsten



Re: [PATCH v36 00/13] /dev/random - a new approach

2020-11-17 Thread Torsten Duwe
On Mon, Nov 02, 2020 at 02:44:35PM +0100, Torsten Duwe wrote:
> 
> Ted, if you don't have the time any more to take care of /dev/random,
> it's not a shame to hand over maintainership, especially given your
> long history of Linux contributions.
> 
> Please do seriously consider to hand it over to someone new. This would
> be a good opportunity.

I can see you are quite busy working on ext4, and there is a number of
patches for drivers/char/random.c awaiting review. Wouldn't it be good
to pass it on to someone more enthusiastic?

At least some sort of reply would be appreciated.
Or are you already pondering the request ;-) ?

Torsten



Re: [PATCH v36 00/13] /dev/random - a new approach

2020-11-02 Thread Torsten Duwe
On Wed, 28 Oct 2020 19:07:28 +0100
Greg Kroah-Hartman  wrote:

> On Wed, Oct 28, 2020 at 06:51:17PM +0100, Torsten Duwe wrote:
> > On Mon, 19 Oct 2020 21:28:50 +0200
> > Stephan Müller  wrote:
> > [...]
> > > * Sole use of crypto for data processing:
> > [...]
> > >  - The LRNG uses only properly defined and implemented
> > > cryptographic algorithms unlike the use of the SHA-1
> > > transformation in the existing /dev/random implementation.
> > > 
> > >  - Hash operations use NUMA-node-local hash instances to benefit
> > > large parallel systems.
> > > 
> > >  - LRNG uses limited number of data post-processing steps
> > [...]
> > > * Performance
> > > 
> > >  - Faster by up to 75% in the critical code path of the interrupt
> > > handler depending on data collection size configurable at kernel
> > > compile time - the default is about equal in performance with
> > > existing /dev/random as outlined in [2] section 4.2.
> > 
> > [...]
> > >  - ChaCha20 DRNG is significantly faster as implemented in the
> > > existing /dev/random as demonstrated with [2] table 2.
> > > 
> > >  - Faster entropy collection during boot time to reach fully
> > > seeded level, including on virtual systems or systems with SSDs as
> > > outlined in [2] section 4.1.
> > > 
> > > * Testing
> > [...]
> > 
> > So we now have 2 proposals for a state-of-the-art RNG, and over a
> > month without a single comment on-topic from any `get_maintainer.pl`
> > 
> > I don't want to emphasise the certification aspects so much. The
> > interrelation is rather that those certifications require certain
> > code features, features which are reasonable per se. But the
> > current code is lagging way behind.
> > 
> > I see the focus namely on performance, scalability, testability and
> > virtualisation. And it certainly is an advantage to use the code
> > already present under crypto, with its optimisations, and not rely
> > on some home brew.
> > 
> > Can we please have a discussion about how to proceed?
> > Ted, Greg, Arnd: which approach would you prefer?
> 
> Greg and Arnd are not the random driver maintainers, as is now
> correctly shown in the 5.10-rc1 MAINTAINERS file, so I doubt we (well
> at least I) have any say here, sorry.

No problem. get_maintainer (for the proposals) works on paths, not on
topics and I didn't want to leave anybody out.

Ted, if you don't have the time any more to take care of /dev/random,
it's not a shame to hand over maintainership, especially given your
long history of Linux contributions.

Please do seriously consider to hand it over to someone new. This would
be a good opportunity.

Torsten



Re: [PATCH v36 00/13] /dev/random - a new approach

2020-10-28 Thread Torsten Duwe
On Mon, 19 Oct 2020 21:28:50 +0200
Stephan Müller  wrote:
[...]
> * Sole use of crypto for data processing:
[...]
>  - The LRNG uses only properly defined and implemented cryptographic
>algorithms unlike the use of the SHA-1 transformation in the
> existing /dev/random implementation.
> 
>  - Hash operations use NUMA-node-local hash instances to benefit large
>parallel systems.
> 
>  - LRNG uses limited number of data post-processing steps
[...]
> * Performance
> 
>  - Faster by up to 75% in the critical code path of the interrupt
> handler depending on data collection size configurable at kernel
> compile time - the default is about equal in performance with
> existing /dev/random as outlined in [2] section 4.2.

[...]
>  - ChaCha20 DRNG is significantly faster as implemented in the
> existing /dev/random as demonstrated with [2] table 2.
> 
>  - Faster entropy collection during boot time to reach fully seeded
>level, including on virtual systems or systems with SSDs as
> outlined in [2] section 4.1.
> 
> * Testing
[...]

So we now have 2 proposals for a state-of-the-art RNG, and over a month
without a single comment on-topic from any `get_maintainer.pl`

I don't want to emphasise the certification aspects so much. The
interrelation is rather that those certifications require certain code
features, features which are reasonable per se. But the current code is
lagging way behind.

I see the focus namely on performance, scalability, testability and
virtualisation. And it certainly is an advantage to use the code
already present under crypto, with its optimisations, and not rely
on some home brew.

Can we please have a discussion about how to proceed?
Ted, Greg, Arnd: which approach would you prefer?

Torsten



Re: [DISCUSSION PATCH 00/41] random: possible ways towards NIST SP800-90B compliance

2020-10-16 Thread Torsten Duwe
On Fri, Oct 02, 2020 at 03:56:28PM +0200, Stephan Mueller wrote:
> Am Freitag, 2. Oktober 2020, 15:15:55 CEST schrieb Willy Tarreau:
> 
> Hi Willy,
> 
> > > And this is all ???
> > 
> > Possibly a lot of people got used to seeing the numerous versions
> > and are less attentive to new series, it's possible that your message
> > will wake everyone up.
> 
> I think that points to my patch series. My patch series which provide a 
> complete separate, API and ABI compliant drop in replacement of /dev/random, 
> nobody from the gatekeepers cared to even answer. It would not touch the 
> existing code.
> 
> After waiting some time without changing the code (e.g. after Andi Lutomirski 
> commented), I got no answer at all from the gatekeepers, not even any 
> indication in what direction I should move if something was not desired in 
> the 
> patch series.
> 
> Thus I continued adding the features that I think are necessary and for which 
> I received comments from mathematicians. What else should I do?
> 
> With the patch set v35 of my patch series, I see all my goals finally 
> achieved at I expect the code to be stable from here on. The last one was the 
> hardest: to get rid of all non-cryptographic conditioning operations and yet 
> retain performance en par or even superior to the existing /dev/random 
> implementation.

Would you mind to resend it here, for a comparison?

Torsten



Re: [DISCUSSION PATCH 00/41] random: possible ways towards NIST SP800-90B compliance

2020-10-02 Thread Torsten Duwe
On Fri, Oct 02, 2020 at 03:33:58PM +0200, Greg Kroah-Hartman wrote:
> On Fri, Oct 02, 2020 at 03:15:55PM +0200, Willy Tarreau wrote:
> > On Fri, Oct 02, 2020 at 02:38:36PM +0200, Torsten Duwe wrote:
> > > Almost two weeks passed and these are the "relevant" replies:
> > > 
> > > Jason personally does not like FIPS, and is afraid of
> > > "subpar crypto". Albeit this patch set strictly isn't about
> > > crypto at all; the crypto subsystem is in the unlucky position
> > > to just depend on a good entropy source.
> > > 
> > > Greg claims that Linux (kernel) isn't about choice, which is clearly
> > > wrong.
> > 
> > I think there's a small misunderstanding here, my understanding is
> > that for quite a while, the possibilities offered by the various
> > random subsystems or their proposed derivative used to range from
> > "you have to choose between a fast system that may be vulnerable
> > to some attacks, a system that might not be vulnerable to certain
> > attacks but might not always boot, or a slow system not vulnerable
> > to certain attacks". Greg's point seems to be that if we add an
> > option, it means it's yet another tradeoff between these possibilities
> > and that someone will still not be happy at the end of the chain. If
> > the proposed solution covers everything at once (performance,
> > reliability, unpredictability), then there probably is no more reason
> > for keeping alternate solutions at all, hence there's no need to give
> > the user the choice between multiple options when only one is known
> > to always be valid. At least that's how I see it and it makes sense
> > to me.
> 
> Thanks for spelling it out in much more detail than I was willing to :)

I assume you're not trying to pull the discussion off-topic. The one and
only choice here is that some people believe in NIST and certifications.
Yes, others don't, no problem either. The former folks boot with fips=1,
that's it. Those people are usually certain about their decision.

That option is about to break, for reasons I stated previously. This patch
set is to introduce the now-missing pieces. One thing worth to discuss here
would be whether people not so security conscious should benefit from the
sanity checks as well. IMHO they should, because, as Willy explained, stick
with the option that's always valid.

My disappointment was that _none_ of the maintaners had an on-topic,
technical remark. I get the impression some read "FIPS" and stop, regardless
of the actual functionality.

Torsten



Re: [DISCUSSION PATCH 00/41] random: possible ways towards NIST SP800-90B compliance

2020-10-02 Thread Torsten Duwe
Almost two weeks passed and these are the "relevant" replies:

Jason personally does not like FIPS, and is afraid of
"subpar crypto". Albeit this patch set strictly isn't about
crypto at all; the crypto subsystem is in the unlucky position
to just depend on a good entropy source.

Greg claims that Linux (kernel) isn't about choice, which is clearly
wrong.

And this is all ???

There are options for stack protection. I can see bounds checking
and other sanity checks all over the place. And doing a similar thing
on entropy sources is a problem?

Admittedly, if entropy sources fail, the kernel will happily remain
running. No bad immediate effects in userland will arise. Only some
cryptographic algorithms, otherwise very decent, will run on
unneccessarily weak keys, probably causing some real-world problems.
Does anybody care?
The NIST and the BSI do, but that does not mean their solutions are
automatically wrong or backdoored.

There is now a well layed-out scheme to ensure quality randomness,
and a lot of work here has been put into its implementation.

Would some maintainer please comment on potential problems or
shortcomings? Otherwise a "Thanks, applied" would be appropriate, IMO.

Torsten



Re: [DISCUSSION PATCH 00/41] random: possible ways towards NIST SP800-90B compliance

2020-09-22 Thread Torsten Duwe
On Tue, 22 Sep 2020 18:21:52 +0200
Greg Kroah-Hartman  wrote:

> On Tue, Sep 22, 2020 at 03:23:44PM +0200, Torsten Duwe wrote:
> > On Mon, Sep 21, 2020 at 10:40:37AM +0200, Stephan Mueller wrote:
> > > Am Montag, 21. September 2020, 09:58:16 CEST schrieb Nicolai
> > > Stange:
> > > 
> > > > - people dislike the approach of having two competing
> > > > implementations for what is basically the same functionality in
> > > > the kernel.
> > > 
> > > Is this really so bad considering the security implications on
> > > this topic? We also have multiple file systems, multiple memory
> > > allocators, etc...
> > 
> > Exactly. I thought Linux was about the freedom of choice.
> 
> http://www.islinuxaboutchoice.com/
> 
> :)

Talk is cheap.

gzip -dc /proc/config.gz | wc -l
9789

:-P
Torsten


Re: [DISCUSSION PATCH 00/41] random: possible ways towards NIST SP800-90B compliance

2020-09-22 Thread Torsten Duwe
On Mon, Sep 21, 2020 at 10:40:37AM +0200, Stephan Mueller wrote:
> Am Montag, 21. September 2020, 09:58:16 CEST schrieb Nicolai Stange:
> 
> > - people dislike the approach of having two competing implementations for
> >   what is basically the same functionality in the kernel.
> 
> Is this really so bad considering the security implications on this topic? We 
> also have multiple file systems, multiple memory allocators, etc...

Exactly. I thought Linux was about the freedom of choice. Some people choose
to get a FIPS certification for their Linux-based products, which mostly
means to restrict crypto capabilities to an "allowed" set, granted. But in
this case people might opt for some sort of "entropy QA". I find it hard to
accept that this option is suppressed, especially if it's because of personal
antipathy of the maintainer about the origin of this change and not for
technical reasons. Restrictions on cryptographic functionality are ok, but
health tests on entropy sources are not?

I do understand people's reluctance after the dual-ECC DRBG desaster, but
OTOH SElinux is generally considered an improvement. Definitely not
everything coming from that direction is tainted.

A big portion of this patch set is cleanup, another one said introduction of
entropy source monitoring. This is important, no matter what your attitude
towards certifications might be.

Torsten



Re: [PATCH] arm64: disable patchable function entry on big-endian clang builds

2020-05-05 Thread Torsten Duwe
Hi Arnd, Mark and others,

this may not be worth arguing but I'm currently fighting excessive
workarounds in another area and so this triggers me, so I have to make
a remark ;-)

On Tue, 5 May 2020 15:25:56 +0100
Mark Rutland  wrote:

> On Tue, May 05, 2020 at 04:12:36PM +0200, Arnd Bergmann wrote:
> > Clang only supports the patchable_function_entry attribute on
> > little-endian arm64 builds, but not on big-endian:
> > 
> > include/linux/kasan-checks.h:16:8: error: unknown attribute
> > 'patchable_function_entry' ignored [-Werror,-Wunknown-attributes]
> > 
> > Disable that configuration with another dependency. Unfortunately
> > the existing check is not enough, as $(cc-option) at this point does
> > not pass the -mbig-endian flag.
> 
> Well that's unfortunate. :(
> 
> Do we know if this is deliberate and/or likely to change in future?
> This practically rules out a BE distro kernel with things like PAC,
> which isn't ideal.

But still better than cumulating workarounds. If clang's flags aren't
orthogonal then that's a bug in clang. If I get a vote here I'm against
it.

> > Fixes: 3b23e4991fb6 ("arm64: implement ftrace with regs")
> > Signed-off-by: Arnd Bergmann 
> 
> This looks fine for now, and we can add a version check in future, so:
  ^^^
see what I mean? And in the end another line of code you'll never again
get rid of.

I suggest to get a quote from clang folks first about their schedule and
regarded importance of patchable-function-entries on BE, and leave it as
is: broken on certain clang configurations. It's not the kernel's fault.

> Acked-by: Mark Rutland 
> 
> Mark.
> 
> > ---
> >  arch/arm64/Kconfig | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 4b256fa6db7a..a33d6402b934 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -151,7 +151,7 @@ config ARM64
> > select HAVE_DMA_CONTIGUOUS
> > select HAVE_DYNAMIC_FTRACE
> > select HAVE_DYNAMIC_FTRACE_WITH_REGS \
> > -   if $(cc-option,-fies on y=2)
> > +   if $(cc-option,-fpatchable-function-entry=2) &&
> > !(CC_IS_CLANG && CPU_BIG_ENDIAN) select
> > HAVE_EFFICIENT_UNALIGNED_ACCESS select HAVE_FAST_GUP
> > select HAVE_FTRACE_MCOUNT_RECORD
> > -- 
> > 2.26.0
> > 



Re: [PATCH v8 0/5] arm64: ftrace with regs

2019-10-19 Thread Torsten Duwe
Hi Mark!

On Fri, 18 Oct 2019 18:41:02 +0100 Mark Rutland
 wrote:

> In the process of reworking this I spotted some issues that will get
> in the way of livepatching. Notably:
> 
> * When modules can be loaded far away from the kernel, we'll
> potentially need a PLT for each function within a module, if each can
> be patched to a unique function. Currently we have a fixed number,
> which is only sufficient for the two ftrace entry trampolines.
> 
>   IIUC, the new code being patched in is itself a module, in which
> case we'd need a PLT for each function in the main kernel image.

When no live patching is involved, obviously all cases need to have
been handled so far. And when a live patching module comes in, there
are calls in and out of the new patch code:

Calls going into the live patch are not aware of this. They are caught
by an active ftrace intercept, and the actual call into the LP module
is done in klp_arch_set_pc, by manipulating the intercept (call site)
return address (in case thread lives in the "new world", for
completeness' sake). This is an unsigned long write in C.

All calls going _out_ from the KLP module are newly generated, as part
of the KLP module building process, and are thus aware of them being
"extern" -- a PLT entry should be generated and accounted for in the
KLP module.

>   We have a few options here, e.g. changing which memory size model we
>   use, or reserving space for a PLT before each function using
>   -f patchable-function-entry=N,M.

Nonetheless I'm happy I once added the ,M option here. You never know :)

> * There are windows where backtracing will miss the callsite's caller,
>   as its address is not live in the LR or existing chain of frame
>   records. Thus we cannot claim to have a reliable stacktrace.
> 
>   I suspect we'll have to teach the stacktrace code to handle this as
> a special-case.

Yes, that's where I had to step back. The unwinder needs to stop where
the chain is even questionable. In _all_ cases. Missing only one race
condition means a lurking inconsistency.

OTOH it's not a problem to report "not reliable" when in doubt; the
thread in question will then get woken up and unwind itself.
It is only an optimisation to let all kernel threads which are
guaranteed to not contain any patched functions sleep on.

>   I'll try to write these up, as similar probably applies to other
>   architectures with a link register.

I thought I'd quickly give you my feedback upfront here.

Torsten



[PATCH 0/6] Add anx6345 DP/eDP bridge for Olimex Teres-I

2019-05-23 Thread Torsten Duwe
Hi all,

left over from my previous Teres-I device tree series, here comes
the revised anx6345 node for the Teres-I, along with the driver.
The innolux panel attached to it is already known; pinebooks can be
enabled on top of this series, once their panels are introduced.

Changes from the respective previous versions:

* the reset polarity is corrected in DT and the driver;
  things should be clearer now.

* as requested, add a panel (the known innolux,n116bge) and connect
  the ports.

* renamed dvdd?? to *-supply to match the established scheme

* trivial update to the #include list, to make it compile in 5.2

Torsten
  


Re: [PATCH 4/4] arm64: DTS: allwinner: a64: enable ANX6345 bridge on Teres-I

2019-05-16 Thread Torsten Duwe
On Thu, May 16, 2019 at 09:06:41AM -0700, Vasily Khoruzhick wrote:
> 
> Driver can talk to the panel over AUX channel only after t1+t3, t1 is
> up to 10ms, t3 is up to 200ms.

This is after power-on. The boot loader needs to deal with this.

> It works with older version of driver
> that keeps panel always on because it takes a while between driver
> probe and pipeline start.

No lid switch, no USB, no WiFi, no MMC. If you disable DCDC1 you'll
run out of wakeup-sources ;-) IOW: I see no practical way any OS
driver can switch this panel voltage off and survive...

> All in all - you don't need panel timings since there's EDID but you
> still need panel delays. Anyway, it's up to you and maintainers.

Let's give it a try.

Torsten



[PATCH 0/4] Add missing device nodes for Olimex Teres-I

2019-05-14 Thread Torsten Duwe
Hi all,

based on Maxime's sunxi-dt64-for-5.2, here is what I found so far
still missing in the device tree. Those bits and pieces have already
been submitted but were not yet applied. Currently I also have the
uart1 for bluetooth here, but I'm unsure about the bindings for the
in-kernel btuart.

  Torsten


Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-04-03 Thread Torsten Duwe
On Wed, Apr 03, 2019 at 03:48:43AM +0100, Mark Rutland wrote:
> Hi Torsten,
> 
> Sorry for the long delay prior to this reply.

I was hoping you would come up with some code to speed things up :(

(For the record, v8 was the latest I sent, but nothing in the locations
mentioned here has changed)

> On Fri, Jan 18, 2019 at 05:39:08PM +0100, Torsten Duwe wrote:
> > --- a/arch/arm64/include/asm/ftrace.h
> > +++ b/arch/arm64/include/asm/ftrace.h
> > @@ -14,9 +14,24 @@
> >  #include 
> >  
> >  #define HAVE_FUNCTION_GRAPH_FP_TEST
> > -#define MCOUNT_ADDR((unsigned long)_mcount)
> >  #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
> >  
> > +/*
> > + * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the 
> > beginning
> > + * of each function, with the second NOP actually calling ftrace. In 
> > contrary
> > + * to a classic _mcount call, the call instruction to be modified is thus
> > + * the second one, and not the only one.
> > + */
> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +#define ARCH_SUPPORTS_FTRACE_OPS 1
> > +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> > +/* All we need is some magic value. Simply use "_mCount:" */
> > +#define MCOUNT_ADDR(0x5f6d436f756e743a)
> 
> I'm really not keen on having a magic constant, and I'd really like to
> see MCOUNT_ADDR disappear entirely when the compiler doesn't generate an
> mcount call. I'm concerned that this is confusing and fragile.

Confusing: agreed. Fragile? don't think so.

> I think that it would be better to make the core ftrace API agnostic of
> mcount, and to teach it that there's a one-time initialization step for
> callsites (which is not necessarily patching a call with a NOP).
> 
> We currently have:
> 
> ftrace_make_nop(mod, rec, addr)
> ftrace_make_call(rec, addr)
> ftrace_modify_call(rec, old_addr, new_addr)
> 
> ... whereas we could have:
> 
> ftrace_call_init(mod, rec)
> ftrace_call_enable(rec, addr)
> ftrace_call_disable(rec, addr)
> ftrace_call_modify(rec, old_addr, new_addr)
> 
> ... so we wouldn't need to special-case anything for initialization (and
> don't need MCOUNT_ADDR at all), and it would be clearer as to what was
> happening at each stage.

I'm fully on your side here, BUT...

This is the dynamic ftrace with regs implementation for aarch64 with the
_current_ dynamic ftrace API. Changing the latter is IMO a different issue.
This implementation has been tested, up to the point of some preliminary live
patching. I propose to commit this and only then change the API for aarch64
and s390 simultaneously, avoiding breakage on ppc64le and x86, of course.
(others to be investigated)

> > +
> > +   /* The program counter just after the ftrace call site */
> > +   str lr, [sp, #S_PC]
> 
> For consistency with the existing ftrace assembly, please use 'x30'
> rather than 'lr'.

You could have raised that concern already along with the "fp" issue for v6.
I don't have a strong preference here; personally I find fp and lr more
readable, but with x29 and x30 one knows where they go.
This is only just a waste of time.

> > -   module_disable_ro(mod);
> > -   *mod->arch.ftrace_trampoline = trampoline;
> > -   module_enable_ro(mod, true);
> > +   /* Check against our well-known list of ftrace entry points */
> > +   if (addr == FTRACE_ADDR || addr == FTRACE_REGS_ADDR) {
> 
> These checks exist within install_ftrace_trampoline(), so we don't need
> to duplicate them here.

True. Code evolution at work.

Any other opinions on the dynamic ftrace API change, anyone?

Torsten



Re: [PATCH] fs/open: Fix most outstanding security bugs

2019-04-01 Thread Torsten Duwe
On Mon, Apr 01, 2019 at 11:01:13AM +0200, Johannes Thumshirn wrote:
> Over the last 20 years, the Linux kernel has accumulated hundreds if not
> thousands of security vulnerabilities.
> 
> One common pattern in most of these security related reports is processes
> called "syzkaller", "trinity" or "syz-executor" opening files and then
> abuse kernel interfaces causing kernel crashes or even worse threats using
> memory overwrites or by exploiting race conditions.
> 
> Hunting down these bugs has become time consuming and very expensive, so
> I've decided to put an end to it.
> 
> If one of the above mentioned processes tries opening a file, return -EPERM
> indicating this process does not have the permission to open files on Linux
> anymore.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  fs/open.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/fs/open.c b/fs/open.c
> index f1c2f855fd43..3a3b460beccd 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -1056,6 +1056,20 @@ long do_sys_open(int dfd, const char __user *filename, 
> int flags, umode_t mode)
>   struct open_flags op;
>   int fd = build_open_flags(flags, mode, );
>   struct filename *tmp;
> + char comm[TASK_COMM_LEN];
> + int i;
> + static const char * const list[] = {

"list" is a bit ambiguous. You could call it "blacklist" or such.

> + "syzkaller",
> + "syz-executor,"
> + "trinity",
> + NULL
> + };
> +
> + get_task_comm(comm, current);
> +
> + for (i = 0; i < ARRAY_SIZE(list); i++)
> + if (!strncmp(comm, list[i], strlen(list[i])))
> + return -EPERM;
   ^^^
should be -ECONNRESET.

Also, I'm missing a sysfs parameter file to add more bad guys dynamically.

>   if (fd)
>   return fd;
> -- 
> 2.16.4

But for a start, this is OK.
In any case, as already mentioned, big player Cisco has shown us that this is
definitely the way to go!

Rviewed-by: Torsten Duwe 



Re: [PATCH v8 0/5] arm64: ftrace with regs

2019-03-29 Thread Torsten Duwe
On Mon, Mar 11, 2019 at 12:18:05PM +, Mark Rutland wrote:
> Hi Torsten,
> 
> On Mon, Mar 11, 2019 at 12:49:46PM +0100, Torsten Duwe wrote:
> > On Wed, Feb 13, 2019 at 11:11:04AM +, Julien Thierry wrote:
> > > Hi Torsten,
> > > 
> > > On 08/02/2019 15:08, Torsten Duwe wrote:
> > > > Patch series v8, as discussed.
> > > > The whole series applies cleanly on 5.0-rc5
> > 
> > So what's the status now? Besides debatable minor style
> > issues there were no more objections to v8. Would this
> > go through the ARM repo or via the ftrace repo?
> 
> Sorry, I have some half-written review comments that I will clean up and
> send shortly.

Ping?

> As commented on prior versions, I'd very much like to see the
> MCOUNT_ADDR hack go, by teaching the core ftrace code to not assume that
> an mcount symbol exists.

> We should be able to do that by separating the notion of NOPing a patch
> site from the notion of initializing it for the first time.

This is generally a good idea, and would affect other architectures as well,
see arch/s390/kernel/ftrace.c ftrace_make_nop(...)

I propose to do this in a second round.

Torsten


Re: [PATCH v8 0/5] arm64: ftrace with regs

2019-03-11 Thread Torsten Duwe
On Wed, Feb 13, 2019 at 11:11:04AM +, Julien Thierry wrote:
> Hi Torsten,
> 
> On 08/02/2019 15:08, Torsten Duwe wrote:
> > Patch series v8, as discussed.
> > The whole series applies cleanly on 5.0-rc5

So what's the status now? Besides debatable minor style
issues there were no more objections to v8. Would this
go through the ARM repo or via the ftrace repo?

Torsten



device tree binding for poly-phased regulators

2019-02-22 Thread Torsten Duwe
Hi!

Documentation/devicetree/bindings/mfd/axp20x.txt nicely describes the
capabilities of the X-powers PMICs; however, it seems polyphasing is
left to comments only.

May I suggest to add a property "poly-phased", just to get the discussion
started? It could be a simple property in the current case for the axp803
and axp806 ("poly-phased with you-know-which") or a phandle in the future.

BTW the axp803 datasheet is quite terse in that respect: is there one
"master" reulator whose settings are copied or does a driver have to keep
them in sync.

Anyway, comments welcome!

Torsten



[PATCH v8 5/5] arm64: use -fpatchable-function-entry if available

2019-02-08 Thread Torsten Duwe
Test whether gcc supports -fpatchable-function-entry and use it to promote
DYNAMIC_FTRACE to DYNAMIC_FTRACE_WITH_REGS. Amend support for the new
object section that holds the locations (__patchable_function_entries) and
define a proper "notrace" attribute to switch it off.

Signed-off-by: Torsten Duwe 

---
 arch/arm64/Kconfig|2 ++
 arch/arm64/Makefile   |5 +
 include/asm-generic/vmlinux.lds.h |1 +
 include/linux/compiler_types.h|2 ++
 kernel/module.c   |7 ++-
 5 files changed, 16 insertions(+), 1 deletion(-)
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -132,6 +132,8 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS \
+   if $(cc-option,-fpatchable-function-entry=2)
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -89,6 +89,11 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
 KBUILD_LDFLAGS_MODULE  += -T $(srctree)/arch/arm64/kernel/module.lds
 endif
 
+ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y)
+  KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FENTRY
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
+
 # Default value
 head-y := arch/arm64/kernel/head.o
 
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -113,6 +113,7 @@
 #define MCOUNT_REC()   . = ALIGN(8);   \
__start_mcount_loc = .; \
KEEP(*(__mcount_loc))   \
+   KEEP(*(__patchable_function_entries))   \
__stop_mcount_loc = .;
 #else
 #define MCOUNT_REC()
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -112,6 +112,8 @@ struct ftrace_likely_data {
 
 #if defined(CC_USING_HOTPATCH)
 #define notrace__attribute__((hotpatch(0, 0)))
+#elif defined(CC_USING_PATCHABLE_FENTRY)
+#define notrace
__attribute__((patchable_function_entry(0)))
 #else
 #define notrace
__attribute__((__no_instrument_function__))
 #endif
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3122,7 +3122,12 @@ static int find_module_sections(struct m
 #endif
 #ifdef CONFIG_FTRACE_MCOUNT_RECORD
/* sechdrs[0].sh_size is always zero */
-   mod->ftrace_callsites = section_objs(info, "__mcount_loc",
+   mod->ftrace_callsites = section_objs(info,
+#ifdef CC_USING_PATCHABLE_FENTRY
+"__patchable_function_entries",
+#else
+"__mcount_loc",
+#endif
 sizeof(*mod->ftrace_callsites),
 >num_ftrace_callsites);
 #endif


[PATCH v8 3/5] arm64: replace -pg with CC_FLAGS_FTRACE in mm/kasan Makefile

2019-02-08 Thread Torsten Duwe
  In preparation for arm64 supporting ftrace built on other compiler
  options, let's have makefiles remove the $(CC_FLAGS_FTRACE)
  flags, whatever these may be, rather than assuming '-pg'.

  There should be no functional change as a result of this patch.

Signed-off-by: Torsten Duwe 

---
 mm/kasan/Makefile |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/kasan/Makefile
+++ b/mm/kasan/Makefile
@@ -5,8 +5,8 @@ UBSAN_SANITIZE_generic.o := n
 UBSAN_SANITIZE_tags.o := n
 KCOV_INSTRUMENT := n
 
-CFLAGS_REMOVE_common.o = -pg
-CFLAGS_REMOVE_generic.o = -pg
+CFLAGS_REMOVE_common.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_generic.o = $(CC_FLAGS_FTRACE)
 # Function splitter causes unnecessary splits in __asan_load1/__asan_store1
 # see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63533
 


[PATCH v8 2/5] arm64: replace -pg with CC_FLAGS_FTRACE in efi Makefiles

2019-02-08 Thread Torsten Duwe
  In preparation for arm64 supporting ftrace built on other compiler
  options, let's have makefiles remove the $(CC_FLAGS_FTRACE)
  flags, whatever these may be, rather than assuming '-pg'.
  While at it, fix arm32 as well.

  There should be no functional change as a result of this patch.

Signed-off-by: Torsten Duwe 

---
 drivers/firmware/efi/libstub/Makefile |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -16,9 +16,9 @@ cflags-$(CONFIG_X86)  += -m$(BITS) -D__K
 
 # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
 # disable the stackleak plugin
-cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie \
-  $(DISABLE_STACKLEAK_PLUGIN)
-cflags-$(CONFIG_ARM)   := $(subst -pg,,$(KBUILD_CFLAGS)) \
+cflags-$(CONFIG_ARM64) := $(subst 
$(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
+  -fpie $(DISABLE_STACKLEAK_PLUGIN)
+cflags-$(CONFIG_ARM)   := $(subst 
$(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
   -fno-builtin -fpic \
   $(call cc-option,-mno-single-pic-base)
 


[PATCH v8 4/5] arm64: implement ftrace with regs

2019-02-08 Thread Torsten Duwe
Implement ftrace with regs, based on the new gcc flag
-fpatchable-function-entry (=2)

Now that gcc8 added 2 NOPs at the beginning of each function, replace
the first NOP thus generated with a quick LR saver (move it to scratch
reg x9), so the 2nd replacement insn, the call to ftrace, does not
clobber the value. Ftrace will then generate the standard stack
frames.

Note that patchable-function-entry in GCC disables IPA-RA, which means
ABI register calling conventions are obeyed and scratch registers such
as x9 are available.

Introduce and handle an ftrace_regs_trampoline for module PLTs, right
after ftrace_trampoline in an ftrace_trampolines[2] array, and double
the size of the corresponding special section.

Signed-off-by: Torsten Duwe 

---
 arch/arm64/include/asm/ftrace.h  |   16 
 arch/arm64/include/asm/module.h  |3 
 arch/arm64/kernel/entry-ftrace.S |  125 +--
 arch/arm64/kernel/ftrace.c   |  117 
 arch/arm64/kernel/module-plts.c  |3 
 arch/arm64/kernel/module.c   |2 
 6 files changed, 231 insertions(+), 35 deletions(-)
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -14,9 +14,16 @@
 #include 
 
 #define HAVE_FUNCTION_GRAPH_FP_TEST
-#define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+/* All we need is some magic value. Simply use "_mCount:" */
+#define MCOUNT_ADDR(0x5f6d436f756e743a)
+#else
+#define MCOUNT_ADDR((unsigned long)_mcount)
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
@@ -34,6 +41,13 @@ extern void return_to_handler(void);
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
/*
+* For -fpatchable-function-entry=2, there's first the
+* LR saver, and only then the actual call insn.
+* Advance addr accordingly.
+*/
+   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
+   return (addr + AARCH64_INSN_SIZE);
+   /*
 * addr is the address of the mcount call instruction.
 * recordmcount does the necessary offset calculation.
 */
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -10,6 +10,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -124,6 +125,7 @@ EXPORT_SYMBOL(_mcount)
 NOKPROBE(_mcount)
 
 #else /* CONFIG_DYNAMIC_FTRACE */
+#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 /*
  * _mcount() is used to build the kernel with -pg option, but all the branch
  * instructions to _mcount() are replaced to NOP initially at kernel start up,
@@ -163,11 +165,6 @@ GLOBAL(ftrace_graph_call)  // ftrace_gra
 
mcount_exit
 ENDPROC(ftrace_caller)
-#endif /* CONFIG_DYNAMIC_FTRACE */
-
-ENTRY(ftrace_stub)
-   ret
-ENDPROC(ftrace_stub)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 /*
@@ -187,7 +184,125 @@ ENTRY(ftrace_graph_caller)
 
mcount_exit
 ENDPROC(ftrace_graph_caller)
+#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
+#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
+
+   .macro  ftrace_regs_entry, allregs=0
+   /* make room for pt_regs, plus a callee frame */
+   sub sp, sp, #(S_FRAME_SIZE + 16)
+
+   /* save function arguments */
+   stp x0, x1, [sp, #S_X0]
+   stp x2, x3, [sp, #S_X2]
+   stp x4, x5, [sp, #S_X4]
+   stp x6, x7, [sp, #S_X6]
+   stp x8, x9, [sp, #S_X8]
 
+   .if \allregs == 1
+   stp x10, x11, [sp, #S_X10]
+   stp x12, x13, [sp, #S_X12]
+   stp x14, x15, [sp, #S_X14]
+   stp x16, x17, [sp, #S_X16]
+   stp x18, x19, [sp, #S_X18]
+   stp x20, x21, [sp, #S_X20]
+   stp x22, x23, [sp, #S_X22]
+   stp x24, x25, [sp, #S_X24]
+   stp x26, x27, [sp, #S_X26]
+   .endif
+
+   /* Save fp and x28, which is used in this function. */
+   stp x28, x29, [sp, #S_X28]
+
+   /* The stack pointer as it was on ftrace_caller entry... */
+   add x28, sp, #(S_FRAME_SIZE + 16)
+   /* ...and the link Register at callee entry */
+   stp x9, x28, [sp, #S_LR]/* to pt_regs.r[30] and .sp */
+
+   /* The program counter just after the ftrace call site */
+   str lr, [sp, #S_PC]
+
+   /* Now fill in callee's preliminary stackframe. */
+   stp x29, x9, [sp, #S_FRAME_SIZE]
+   /* Let FP point to it. */
+   add x29, sp, #S_FRAME_SIZE
+
+   /* Our stackframe, stored inside pt_regs. */
+   stp x29, x30, [sp, #S_STACKFRAME]
+   add x29, sp, #S_STACKFRAME
+   .endm
+
+ENTRY(ftrace_regs_caller)
+   ftrace_regs_entry   1
+   b   ftrace_common
+ENDPROC(ftrace_regs_caller)
+
+ENTRY(ftrace_caller)
+   ftrace_regs_entry   0
+   b   ftrace_common
+ENDPROC(ftrace_caller)
+
+ENTRY(ftrace_common)
+
+   mov 

[PATCH v8 1/5] arm64: replace -pg with CC_FLAGS_FTRACE in arm64 Makefiles

2019-02-08 Thread Torsten Duwe
  In preparation for arm64 supporting ftrace built on other compiler
  options, let's have the arm64 makefiles remove the $(CC_FLAGS_FTRACE)
  flags, whatever these may be, rather than assuming '-pg'.

  There should be no functional change as a result of this patch.

Signed-off-by: Torsten Duwe 

---
 arch/arm64/kernel/Makefile |6 +++---
 arch/arm64/lib/Makefile|2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$(
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
 
-CFLAGS_REMOVE_ftrace.o = -pg
-CFLAGS_REMOVE_insn.o = -pg
-CFLAGS_REMOVE_return_address.o = -pg
+CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_insn.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = $(CC_FLAGS_FTRACE)
 
 # Object file lists.
 obj-y  := debug-monitors.o entry.o irq.o fpsimd.o  
\
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -24,7 +24,7 @@ CFLAGS_atomic_ll_sc.o := -ffixed-x1 -ffi
   -fcall-saved-x10 -fcall-saved-x11 -fcall-saved-x12   \
   -fcall-saved-x13 -fcall-saved-x14 -fcall-saved-x15   \
   -fcall-saved-x18 -fomit-frame-pointer
-CFLAGS_REMOVE_atomic_ll_sc.o := -pg
+CFLAGS_REMOVE_atomic_ll_sc.o := $(CC_FLAGS_FTRACE)
 GCOV_PROFILE_atomic_ll_sc.o:= n
 KASAN_SANITIZE_atomic_ll_sc.o  := n
 KCOV_INSTRUMENT_atomic_ll_sc.o := n


[PATCH v8 0/5] arm64: ftrace with regs

2019-02-08 Thread Torsten Duwe
Patch series v8, as discussed.
The whole series applies cleanly on 5.0-rc5

---
 arch/arm64/Kconfig|4 +
 arch/arm64/Makefile   |   10 ++
 arch/arm64/include/asm/ftrace.h   |   16 
 arch/arm64/include/asm/module.h   |3 
 arch/arm64/kernel/Makefile|   12 +--
 arch/arm64/kernel/entry-ftrace.S  |  125 --
 arch/arm64/kernel/ftrace.c|  117 ---
 arch/arm64/kernel/module-plts.c   |3 
 arch/arm64/kernel/module.c|2 
 arch/arm64/lib/Makefile   |4 -
 drivers/firmware/efi/libstub/Makefile |   12 +--
 include/asm-generic/vmlinux.lds.h |2 
 include/linux/compiler_types.h|4 +
 kernel/module.c   |   14 +++
 mm/kasan/Makefile |8 +-
 15 files changed, 281 insertions(+), 55 deletions(-)
---
changes since v7:

* -pg -> $(CC_FLAGS_FTRACE) cleanup now split according to subtree
  maintainership.

* REC_IP_BRANCH_OFFSET is gone, the functionality went into
  ftrace_call_adjust(), where it belongs.

* MOV_X9_X30 macro is gone (why did we argue about its name anyway?);
  it is only used once now in the initial ftrace_make_nop new helper
  function ftrace_setup_lr_saver(), suggested by Julien.

* call site processing was missing for modules. Fixed.

changes since v6:

* change the stack layout once more; I hope I have it the "standard" way now.
  And yes, it looks simpler and cleaner; thanks, Mark, for nagging.

* split out the independent Kconfig and Makefile changes

* fixed style issues

* s/fp/x29/g

* MCOUNT_ADDR is now merely a 64-bit magic, as this is totally sufficient.

* QUICK_LR_SAVE renamed back to MOV_X9_X30.

* place MOV_X9_X30 insns on bootup, and only flip b <-> nop at runtime

* graph tracer "ifdeffery" reshuffle

Torsten
  


Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-07 Thread Torsten Duwe
On Thu, Feb 07, 2019 at 09:51:57AM -0500, Steven Rostedt wrote:
> On Thu, 7 Feb 2019 10:33:50 +
> Julien Thierry  wrote:
> 
> > I don't see really much documentation on that function. As far as I can
> > tell it is only called once for each site (and if it didn't, we'd always
> > be placing the same instruction, but I agree it wouldn't be nice). It
> > could depend on how far you can expand the notion of "adjusting" :) .
> > 
> > Steven, do you have an opinion on whether it would be acceptable to
> > modify function entry code in ftrace_call_adjust() ?
> 
> Just to make sure I'm on the same page as you are. You want to modify
> the function entry code at the time of the ftrace_call_adjust()?

Yes, this was the fiendish plan ;-)

> I would update the rec->ip to the offset you want at
> ftrace_call_adjust() but not do any modifications. It really isn't safe
> to do it there. But right after that is called, you will have the arch
> specific call of ftrace_make_nop() called with MCOUNT_ADDR as the
> second parameter to let you know that this is the first time the call
> is made at this address. This is where you can do that initial
> modifications.

Ok, so just substitute REC_IP_BRANCH_OFFSET arithmetic with
ftrace_call_adjust() and keep the MCOUNT_ADDR logic.
Thanks for the clarification.

Torsten



Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-07 Thread Torsten Duwe
On Thu, Feb 07, 2019 at 10:33:50AM +, Julien Thierry wrote:
> 
> 
> On 06/02/2019 15:05, Torsten Duwe wrote:
> > On Wed, Feb 06, 2019 at 08:59:44AM +, Julien Thierry wrote:
> >> Hi Torsten,
> >>
> >> On 18/01/2019 16:39, Torsten Duwe wrote:
> >>
> >>> --- a/arch/arm64/kernel/ftrace.c
> >>> +++ b/arch/arm64/kernel/ftrace.c
> >>> @@ -133,17 +163,45 @@ int ftrace_make_call(struct dyn_ftrace *
> >>>   return ftrace_modify_code(pc, old, new, true);
> >>>  }
> >>>  
> >>> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> >>> +int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> >>> + unsigned long addr)
> >>> +{
> >>> + unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> >>> + u32 old, new;
> >>> +
> >>> + old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
> >>> + new = aarch64_insn_gen_branch_imm(pc, addr, true);
> >>> +
> >>> + return ftrace_modify_code(pc, old, new, true);
> >>> +}
> >>> +#endif
> >>> +
> >>>  /*
> >>>   * Turn off the call to ftrace_caller() in instrumented function
> >>>   */
> >>>  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> >>>   unsigned long addr)
> >>>  {
> >>> - unsigned long pc = rec->ip;
> >>> + unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> >>
> >> Sorry to come back on this patch again, but I was looking at the ftrace
> >> code a bit, and I see that when processing the ftrace call locations,
> >> ftrace calls ftrace_call_adjust() on every ip registered as mcount
> >> caller (or in our case patchable entries). This ftrace_call_adjust() is
> >> arch specific, so I was thinking we could place the offset in here once
> >> and for all so we don't have to worry about it in the future.
> > 
> > Now that you mention it - yes indeed that's the correct facility to fix
> > the deviating address, as Steve has also confirmed. I had totally forgotten
> > about this hook.
> > 
> >> Also, I'm unsure whether it would be safe, but we could patch the "mov
> >> x9, lr" there as well. In theory, this would be called at init time
> >> (before secondary CPUs are brought up) and when loading a module (so I'd
> >> expect no-one is executing that code *yet*.
> >>
> >> If this is possible, I think it would make things a bit cleaner.
> > 
> > This is in fact very tempting, but it will introduce a nasty side effect
> > to ftrace_call_adjust. Is there any obvious documentation that specifies
> > guarantees about ftrace_call_adjust being called exactly once for each site?
> > 
> 
> I don't see really much documentation on that function. As far as I can
> tell it is only called once for each site (and if it didn't, we'd always
> be placing the same instruction, but I agree it wouldn't be nice). It
> could depend on how far you can expand the notion of "adjusting" :) .

I've been thinking this over and I'm considering to make an ftrace_modify_code
with verify and warn_once if it fails. Then read the insn back and bug_on
should it not be the lr saver. Any objections?

> Steven, do you have an opinion on whether it would be acceptable to
> modify function entry code in ftrace_call_adjust() ?

Yes, Steve's vote first.

Torsten



Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-06 Thread Torsten Duwe
On Wed, Feb 06, 2019 at 08:59:44AM +, Julien Thierry wrote:
> Hi Torsten,
> 
> On 18/01/2019 16:39, Torsten Duwe wrote:
> 
> > --- a/arch/arm64/kernel/ftrace.c
> > +++ b/arch/arm64/kernel/ftrace.c
> > @@ -133,17 +163,45 @@ int ftrace_make_call(struct dyn_ftrace *
> > return ftrace_modify_code(pc, old, new, true);
> >  }
> >  
> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> > +   unsigned long addr)
> > +{
> > +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> > +   u32 old, new;
> > +
> > +   old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
> > +   new = aarch64_insn_gen_branch_imm(pc, addr, true);
> > +
> > +   return ftrace_modify_code(pc, old, new, true);
> > +}
> > +#endif
> > +
> >  /*
> >   * Turn off the call to ftrace_caller() in instrumented function
> >   */
> >  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> > unsigned long addr)
> >  {
> > -   unsigned long pc = rec->ip;
> > +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> 
> Sorry to come back on this patch again, but I was looking at the ftrace
> code a bit, and I see that when processing the ftrace call locations,
> ftrace calls ftrace_call_adjust() on every ip registered as mcount
> caller (or in our case patchable entries). This ftrace_call_adjust() is
> arch specific, so I was thinking we could place the offset in here once
> and for all so we don't have to worry about it in the future.

Now that you mention it - yes indeed that's the correct facility to fix
the deviating address, as Steve has also confirmed. I had totally forgotten
about this hook.

> Also, I'm unsure whether it would be safe, but we could patch the "mov
> x9, lr" there as well. In theory, this would be called at init time
> (before secondary CPUs are brought up) and when loading a module (so I'd
> expect no-one is executing that code *yet*.
> 
> If this is possible, I think it would make things a bit cleaner.

This is in fact very tempting, but it will introduce a nasty side effect
to ftrace_call_adjust. Is there any obvious documentation that specifies
guarantees about ftrace_call_adjust being called exactly once for each site?

Torsten



Re: [PATCH RESEND v2 05/12] drm/bridge: Add Analogix anx6345 support

2019-02-05 Thread Torsten Duwe
First thing that struck me is that the chip's reset is actually low active

   reset-gpios = < 3 24 GPIO_ACTIVE_LOW>; /* PD24 */

(please correct this in patches 11 and 12)

Consequently, you're using inverted values here in the driver:

> +static void anx6345_poweron(struct anx6345 *anx6345)
> +{
[...]
> + gpiod_set_value_cansleep(pdata->gpiod_reset, 0);

0 = reset on, ok.

> + usleep_range(1000, 2000);
> +
> + gpiod_set_value_cansleep(pdata->gpiod_reset, 1);

1 = reset off, also fine.

> +
> + /* Power on registers module */
> + anx6345_set_bits(anx6345->map[I2C_IDX_TXCOM], SP_POWERDOWN_CTRL_REG,
> +  SP_HDCP_PD | SP_AUDIO_PD | SP_VIDEO_PD | SP_LINK_PD);
> + anx6345_clear_bits(anx6345->map[I2C_IDX_TXCOM], SP_POWERDOWN_CTRL_REG,
> +SP_REGISTER_PD | SP_TOTAL_PD);
> +
> + anx6345->powered = true;
> +}
> +
> +static void anx6345_poweroff(struct anx6345 *anx6345)
> +{
> + struct anx6345_platform_data *pdata = >pdata;
> + int err;
> +
> + if (WARN_ON(!anx6345->powered))
> + return;
> +
> + gpiod_set_value_cansleep(pdata->gpiod_reset, 1);
> + usleep_range(1000, 2000);

This one got me a bit confused. Assert or deassert reset (again) before
poweroff?

Please stick to the logical value of the reset line.

Torsten



Re: [PATCH 0/9] Analogix ANX6345 RGB-(e)DP bridge support

2019-02-04 Thread Torsten Duwe
On Thu, Oct 18, 2018 at 03:33:18PM +0800, Icenowy Zheng wrote:
> This patchset brings the support for Analogix ANX6345 RGB-(e)DP bridge,
> which is used by some Allwinner A64 laptops, such as Pinebook and Olimex
> TERES-I.
> 

So what's the status here? I'm working on the Teres-I and I find myself
recreating the chunks in this patchset almost verbatim (only DT so far),
one by one, so there must be something right about them ;-)

Whose turn is it?

Torsten



Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-02-04 Thread Torsten Duwe
On Tue, Jan 22, 2019 at 02:55:12PM +0100, Ard Biesheuvel wrote:
> On Tue, 22 Jan 2019 at 14:28, Torsten Duwe  wrote:
> >
> > On Tue, Jan 22, 2019 at 10:18:17AM +, Julien Thierry wrote:
> > > Hi Torsten,
> > >
> > > A few suggestions below.
> > >
> > > > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > > > +#define ARCH_SUPPORTS_FTRACE_OPS 1
> > > > +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> > > > +/* All we need is some magic value. Simply use "_mCount:" */
> > >
> > > Nit: Should the casing be "_mcount" ?
> >
> > No! The string makes it clear what it's supposed to be and the peculiar
> > casing makes it unique and leaves no doubt were it came from. So whenever
> > you see this register value in a crash dump you don't have to wonder about
> > weird linkage errors, as it surely did not originate from a symtab.
> >
> 
> In that case, do you need to deal with endianness here?
> 
> > > > +#define MCOUNT_ADDR(0x5f6d436f756e743a)

Strictly speaking, yes. OTOH "wrong-andian" machines always show a difference
when memory is dumped in bigger units than bytes, so when you see the register
value in hex...

Since all that's needed is a somewhat unique constant, let's not over-engineer
this ok?

If there are no other comments I'll send out v8 with the discussed changes.

Torsten



Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-01-22 Thread Torsten Duwe
On Tue, Jan 22, 2019 at 10:18:17AM +, Julien Thierry wrote:
> Hi Torsten,
> 
> A few suggestions below.
> 
> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +#define ARCH_SUPPORTS_FTRACE_OPS 1
> > +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> > +/* All we need is some magic value. Simply use "_mCount:" */
> 
> Nit: Should the casing be "_mcount" ?

No! The string makes it clear what it's supposed to be and the peculiar
casing makes it unique and leaves no doubt were it came from. So whenever
you see this register value in a crash dump you don't have to wonder about
weird linkage errors, as it surely did not originate from a symtab.

> > +#define MCOUNT_ADDR(0x5f6d436f756e743a)
> > +#else
> > +#define REC_IP_BRANCH_OFFSET 0
> > +#define MCOUNT_ADDR((unsigned long)_mcount)
> > +#endif
> > +
> >  #ifndef __ASSEMBLY__
> >  #include 
> >  
> > --- a/arch/arm64/kernel/entry-ftrace.S
> > +++ b/arch/arm64/kernel/entry-ftrace.S
> > @@ -10,6 +10,7 @@
> >   */
> >  
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
^^^
> > +
> > +ENTRY(ftrace_common)
> > +
> > +   mov x3, sp  /* pt_regs are @sp */
> > +   ldr_l   x2, function_trace_op, x0
> > +   mov x1, x9  /* parent IP */
> > +   sub x0, lr, #8  /* function entry == IP */
> 
> The #8 is because we go back two instructions right? Can we use
> #(AARCH64_INSN_SIZE * 2) instead?

Hmm,  was already included, so why not. (same below)

> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> > +   unsigned long addr)
> > +{
> > +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> > +   u32 old, new;
> > +
> > +   old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
> > +   new = aarch64_insn_gen_branch_imm(pc, addr, true);
> 
> The last argument of aarch64_insn_gen_branch_imm() is an enum, not a
> boolean.
> 
> You should use AARCH64_INSN_BRANCH_LINK here which happens to be equal to 1.

That's what you get when you keep copying code...
Good catch, thanks!

> > +*  initially; the NOPs are already there. So instead,
> > +*  put the LR saver there ahead of time, in order to avoid
> > +*  any race condition over patching 2 instructions.
> > +*/
> > +   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) &&
> > +   addr == MCOUNT_ADDR) {
> 
> This works, however it feels a bit weird since core code asked to
> generate a NOP but not only we don't generate it but we put another
> instruction instead.

It's not the first time weird x86 sets the standards and all others
need workarounds. But here it just came handy to abuse this call :-)

> I think it would be useful to state that the replacement of mcount calls
> by nops is only ever done once at system initialization.
> 
> Or maybe have a intermediate function:
> 
> static int ftrace_setup_patchable_entry(unsigned long addr)
> {
>   u32 old, new;
> 
>   old = aarch64_insn_gen_nop();
>   new = MOV_X9_X30;
>   pc -= REC_IP_BRANCH_OFFSET;
>   return ftrace_modify_code(pc, old, new, validate);
> }
> 
>   [...]
> 
>   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) &&
>   addr == MCOUNT_ADDR)
>   return ftrace_setup_patchable_entry(pc);
> 
> 
> This way it clearly show that this is a setup/init corner case.

I'll definitely consider this.

Thanks for the input,

Torsten




Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-01-22 Thread Torsten Duwe
Hi Balbir!

On Tue, Jan 22, 2019 at 02:39:32PM +1300, Singh, Balbir wrote:
> 
> On 1/19/19 5:39 AM, Torsten Duwe wrote:
> > + */
> > +ftrace_common_return:
> > +   /* restore function args */
> > +   ldp x0, x1, [sp]
> > +   ldp x2, x3, [sp, #S_X2]
> > +   ldp x4, x5, [sp, #S_X4]
> > +   ldp x6, x7, [sp, #S_X6]
> > +   ldr x8, [sp, #S_X8]
> > +
> > +   /* restore fp and x28 */
> > +   ldp x28, x29, [sp, #S_X28]
> > +
> > +   ldr lr, [sp, #S_LR]
> > +   ldr x9, [sp, #S_PC]
> 
> Is it fair to assume that we never modify registers beyond LR and PC as a 
> result of ftrace/livepatching? I presume it is, but just checking.

These are either callee-save or scratch. Whatever is called, ftrace framework
functions or replacement functions, must preserve the callee-saved regs; and
the caller, who made a function call (sic!-) saves caller-saved and marks the
rest dead on return. So it's the arguments that matter after all.

As you can see, disabling IPA-RA is cruicial here.

Or are you talking about deliberate argument manipulation?

> > +   unsigned long pc = rec->ip + REC_IP_BRANCH_OFFSET;
> > +   u32 old, new;
> > +
> > +   old = aarch64_insn_gen_branch_imm(pc, old_addr, true);
> > +   new = aarch64_insn_gen_branch_imm(pc, addr, true);
> > +
> 
> Is this a branch or a call? Does addr always fit in the immediate limits?

As Julien has now pointed out, the correct enum value AARCH64_INSN_BRANCH_LINK
should clarify this. It will surely fit for the kernel proper, and the modules
are handled with the trampolines.

> > +   return ftrace_modify_code(pc, old, new, true);
> 
> Can you talk to the semantics of whether this operation is atomic w.r.t 
> system? Will old and new return consistent values? Given the nature of 
> ftrace, I presume it's well isolated. 

aarch64_insn_patch_text_nosync() does a __flush_icache_range() on success.
Mark wrote that this is already sufficient IIRC. (I had memory barriers
there, when I was still trying to modify 2 insns every time).

> 
> > +   if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) &&
> > +   addr == MCOUNT_ADDR) {
> > +   old = aarch64_insn_gen_nop();
> > +   new = MOV_X9_X30;
> > +   pc -= REC_IP_BRANCH_OFFSET;
> > +   return ftrace_modify_code(pc, old, new, validate);
> 
> I presume all the icache flush and barrier handling is in 
> ftrace_modify_code()?

Yes, see above.

> > +   }
> > +
> > if (offset < -SZ_128M || offset >= SZ_128M) {
> >  #ifdef CONFIG_ARM64_MODULE_PLTS
> > u32 replaced;
> > --- a/arch/arm64/include/asm/module.h
> > +++ b/arch/arm64/include/asm/module.h
> > @@ -32,7 +32,8 @@ struct mod_arch_specific {
> > struct mod_plt_sec  init;
> >  
> > /* for CONFIG_DYNAMIC_FTRACE */
> > -   struct plt_entry*ftrace_trampoline;
> > +   struct plt_entry*ftrace_trampolines;
> > +#define MOD_ARCH_NR_FTRACE_TRAMPOLINES 2
> 
> I don't see the generation of ftrace_trampolines[1]
> 

That was further up, install_ftrace_trampoline() in kernel/ftrace.c.

+   if (*addr == FTRACE_ADDR)
+   mod_trampoline = >arch.ftrace_trampolines[0];
+   else if (*addr == FTRACE_REGS_ADDR)
+   mod_trampoline = >arch.ftrace_trampolines[1];
[...]
+   trampoline = get_plt_entry(*addr, mod_trampoline);
+
+   if (!plt_entries_equal(mod_trampoline, )) {
[...]

get_plt_entry() generates a small bunch of instructions that easily
fit into the argument registers. Compare commit bdb85cd1d20669dfae8
for the new trampoline insns.

Hope I've covered all your concerns,

Torsten



[PATCH v7 1/3] arm64: replace -pg with CC_FLAGS_FTRACE in Makefiles

2019-01-18 Thread Torsten Duwe
Ftrace instrumentation might also be introduced by
-fpatchable-function-entry, not only -pg. Ensure the Makefiles are
flexible to filter out the respective flags in "notrace" directories.

Signed-off-by: Torsten Duwe 

---
 arch/arm64/kernel/Makefile|6 +++---
 drivers/firmware/efi/libstub/Makefile |3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$(
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
 
-CFLAGS_REMOVE_ftrace.o = -pg
-CFLAGS_REMOVE_insn.o = -pg
-CFLAGS_REMOVE_return_address.o = -pg
+CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_insn.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = $(CC_FLAGS_FTRACE)
 
 # Object file lists.
 obj-y  := debug-monitors.o entry.o irq.o fpsimd.o  
\
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -16,7 +16,8 @@ cflags-$(CONFIG_X86)  += -m$(BITS) -D__K
 
 # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
 # disable the stackleak plugin
-cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie \
+cflags-$(CONFIG_ARM64) := $(filter-out $(CC_FLAGS_FTRACE)\
+ ,$(KBUILD_CFLAGS)) -fpie \
   $(DISABLE_STACKLEAK_PLUGIN)
 cflags-$(CONFIG_ARM)   := $(subst -pg,,$(KBUILD_CFLAGS)) \
   -fno-builtin -fpic \


[PATCH v7 2/3] arm64: implement ftrace with regs

2019-01-18 Thread Torsten Duwe
Once gcc8 adds 2 NOPs at the beginning of each function, replace the
first NOP thus generated with a quick LR saver (move it to scratch reg
x9), so the 2nd replacement insn, the call to ftrace, does not clobber
the value. Ftrace will then generate the standard stack frames.

Note that patchable-function-entry in GCC disables IPA-RA, which means
ABI register calling conventions are obeyed *and* scratch registers
such as x9 are available.

Introduce and handle an ftrace_regs_trampoline for module PLTs, right
after ftrace_trampoline, and double the size of this special section.

Signed-off-by: Torsten Duwe 

---

Mark, if you see your ftrace entry macro code being represented correctly
here, please add your sign-off, As I've initially copied it from your mail.

---
 arch/arm64/include/asm/ftrace.h  |   17 -
 arch/arm64/include/asm/module.h  |3 
 arch/arm64/kernel/entry-ftrace.S |  125 +--
 arch/arm64/kernel/ftrace.c   |  114 ++-
 arch/arm64/kernel/module-plts.c  |3 
 arch/arm64/kernel/module.c   |2 
 6 files changed, 227 insertions(+), 37 deletions(-)
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -14,9 +14,24 @@
 #include 
 
 #define HAVE_FUNCTION_GRAPH_FP_TEST
-#define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+/*
+ * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
+ * of each function, with the second NOP actually calling ftrace. In contrary
+ * to a classic _mcount call, the call instruction to be modified is thus
+ * the second one, and not the only one.
+ */
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
+/* All we need is some magic value. Simply use "_mCount:" */
+#define MCOUNT_ADDR(0x5f6d436f756e743a)
+#else
+#define REC_IP_BRANCH_OFFSET 0
+#define MCOUNT_ADDR((unsigned long)_mcount)
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -10,6 +10,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -124,6 +125,7 @@ EXPORT_SYMBOL(_mcount)
 NOKPROBE(_mcount)
 
 #else /* CONFIG_DYNAMIC_FTRACE */
+#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 /*
  * _mcount() is used to build the kernel with -pg option, but all the branch
  * instructions to _mcount() are replaced to NOP initially at kernel start up,
@@ -163,11 +165,6 @@ GLOBAL(ftrace_graph_call)  // ftrace_gra
 
mcount_exit
 ENDPROC(ftrace_caller)
-#endif /* CONFIG_DYNAMIC_FTRACE */
-
-ENTRY(ftrace_stub)
-   ret
-ENDPROC(ftrace_stub)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 /*
@@ -187,7 +184,125 @@ ENTRY(ftrace_graph_caller)
 
mcount_exit
 ENDPROC(ftrace_graph_caller)
+#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
+#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
+
+   .macro  ftrace_regs_entry, allregs=0
+   /* make room for pt_regs, plus a callee frame */
+   sub sp, sp, #(S_FRAME_SIZE + 16)
+
+   /* save function arguments */
+   stp x0, x1, [sp, #S_X0]
+   stp x2, x3, [sp, #S_X2]
+   stp x4, x5, [sp, #S_X4]
+   stp x6, x7, [sp, #S_X6]
+   stp x8, x9, [sp, #S_X8]
 
+   .if \allregs == 1
+   stp x10, x11, [sp, #S_X10]
+   stp x12, x13, [sp, #S_X12]
+   stp x14, x15, [sp, #S_X14]
+   stp x16, x17, [sp, #S_X16]
+   stp x18, x19, [sp, #S_X18]
+   stp x20, x21, [sp, #S_X20]
+   stp x22, x23, [sp, #S_X22]
+   stp x24, x25, [sp, #S_X24]
+   stp x26, x27, [sp, #S_X26]
+   .endif
+
+   /* Save fp and x28, which is used in this function. */
+   stp x28, x29, [sp, #S_X28]
+
+   /* The stack pointer as it was on ftrace_caller entry... */
+   add x28, sp, #(S_FRAME_SIZE + 16)
+   /* ...and the link Register at callee entry */
+   stp x9, x28, [sp, #S_LR]/* to pt_regs.r[30] and .sp */
+
+   /* The program counter just after the ftrace call site */
+   str lr, [sp, #S_PC]
+
+   /* Now fill in callee's preliminary stackframe. */
+   stp x29, x9, [sp, #S_FRAME_SIZE]
+   /* Let FP point to it. */
+   add x29, sp, #S_FRAME_SIZE
+
+   /* Our stackframe, stored inside pt_regs. */
+   stp x29, x30, [sp, #S_STACKFRAME]
+   add x29, sp, #S_STACKFRAME
+   .endm
+
+ENTRY(ftrace_regs_caller)
+   ftrace_regs_entry   1
+   b   ftrace_common
+ENDPROC(ftrace_regs_caller)
+
+ENTRY(ftrace_caller)
+   ftrace_regs_entry   0
+   b   ftrace_common
+ENDPROC(ftrace_caller)
+
+ENTRY(ftrace_common)
+
+   mov x3, sp  /* pt_regs are @sp */
+   ldr_l   x2, function_trace_op, x0
+   mov x1, x9  /* parent IP */
+   sub x0, lr, #8  /* function entry == IP */

[PATCH v7 3/3] arm64: use -fpatchable-function-entry if available

2019-01-18 Thread Torsten Duwe
Test whether gcc supports -fpatchable-function-entry and use it to promote
DYNAMIC_FTRACE to DYNAMIC_FTRACE_WITH_REGS. Amend support for the new object
section that holds the locations (__patchable_function_entries) and define
a proper "notrace" attribute to switch it off.

Signed-off-by: Torsten Duwe 

---
 arch/arm64/Kconfig|2 ++
 arch/arm64/Makefile   |5 +
 include/asm-generic/vmlinux.lds.h |1 +
 include/linux/compiler_types.h|2 ++
 4 files changed, 10 insertions(+)
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -132,6 +132,8 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS \
+   if $(cc-option,-fpatchable-function-entry=2)
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -89,6 +89,11 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
 KBUILD_LDFLAGS_MODULE  += -T $(srctree)/arch/arm64/kernel/module.lds
 endif
 
+ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y)
+  KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FENTRY
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
+
 # Default value
 head-y := arch/arm64/kernel/head.o
 
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -113,6 +113,7 @@
 #define MCOUNT_REC()   . = ALIGN(8);   \
__start_mcount_loc = .; \
KEEP(*(__mcount_loc))   \
+   KEEP(*(__patchable_function_entries))   \
__stop_mcount_loc = .;
 #else
 #define MCOUNT_REC()
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -112,6 +112,8 @@ struct ftrace_likely_data {
 
 #if defined(CC_USING_HOTPATCH)
 #define notrace__attribute__((hotpatch(0, 0)))
+#elif defined(CC_USING_PATCHABLE_FENTRY)
+#define notrace
__attribute__((patchable_function_entry(0)))
 #else
 #define notrace
__attribute__((__no_instrument_function__))
 #endif


[PATCH v7 0/3] arm64: ftrace with regs

2019-01-18 Thread Torsten Duwe
So here's v7 of ftrace with regs only. I've split out the CC_FLAGS_FTRACE
cleanup and the gcc activation into separate patches, respectively. The set
should include all of Mark's requested changes. Most notably, it now patches
in the first insn "mov x9, lr" right at startup, to avoid the races we
discussed; I'm conveniently abusing the initial _make_nop for that. The empty
mcount: routine caused a lot of Q's, so it's gone now.

I updated the accompanying livepatch patches here as well, in case somebody is
interested ;) They have only been updated to match this current ftrace-regs set,
not more.

The whole series applies cleanly on 5.0-rc2

in detail:

changes since v6:

* change the stack layout once more; I hope I have it the "standard" way now.
  And yes, it looks simpler and cleaner; thanks, Mark, for nagging.

* split out the independent Kconfig and Makefile changes

* fixed style issues

* s/fp/x29/g

* MCOUNT_ADDR is now merely a 64-bit magic, as this is totally sufficient.

* QUICK_LR_SAVE renamed back to MOV_X9_X30.

* place MOV_X9_X30 insns on bootup, and only flip b <-> nop at runtime

* graph tracer "ifdeffery" reshuffle

Torsten
  


Re: [PATCH v6] arm64: implement ftrace with regs

2019-01-17 Thread Torsten Duwe
On Wed, Jan 16, 2019 at 09:57:39AM +, Julien Thierry wrote:
> 
> I wanted to test this patch (and try to benchmark having the "mov x9,
> x30" always present in function prelude vs having two nops), but I
> cannot get this patch to apply (despite having a version including both
> commits below).
> 
> Could you provide a git branch from which I could try to rebase the
> patch? (Or a new version of the series)

It also applies with just a single, harmless fuzz 1 onto 5.0-rc2.
Does that help?

I'm currently over v7, with Mark's requests included and the mov patching
done only once at bootup...

Torsten



Re: ppc64le reliable stack unwinder and scheduled tasks

2019-01-13 Thread Torsten Duwe
On Sun, 13 Jan 2019 23:33:56 +1100
Balbir Singh  wrote:

> On Sat, Jan 12, 2019 at 02:45:41AM -0600, Segher Boessenkool wrote:
> > On Sat, Jan 12, 2019 at 12:09:14PM +1100, Balbir Singh wrote:
> > > Could you please define interesting frame on top a bit more?
> > > Usually the topmost return address is in LR
> > 
> > There is no reliable way (other than DWARF unwind info) to find out
> > where the value of LR at function entry currently lives (if
> > anywhere). It may or may not be still available in LR, it may or
> > may not be saved to the return stack slot.  It can also live in
> > some GPR, or in some other stack slot.
> > 
> > (The same is true for all other registers).
> > 
> > The only thing the ABI guarantees you is that you can find all
> > stack frames via the back chain.  If you want more you can use some
> > heuristics and do some heroics (like GDB does), but this is not
> > fully reliable.  Using DWARF unwind info is, but that requires big
> > tables.
> >
> 
> Thanks, so are you suggesting that a reliable stack is not possible on
> ppc64le? Even with the restricted scope of the kernel?

The LR value location is _always_ hard to determine for the topmost
frame. This is not a problem for voluntarily sleeping tasks, because the
topmost function will always be well known. It is a problem for tasks preempted
by an interrupt, or those handling an exception, that's why these need
to report "unreliable".

Note that this is a very general problem, across _all_ RISC-like
architectures. It should thus be handled as generically as possible.

Torsten



Re: [PATCH v6] arm64: implement ftrace with regs

2019-01-05 Thread Torsten Duwe
On Fri, Jan 04, 2019 at 11:41:45PM +0100, Torsten Duwe wrote:
> On Fri, Jan 04, 2019 at 01:06:48PM -0500, Steven Rostedt wrote:
> > On Fri, 4 Jan 2019 17:50:18 +
> > Mark Rutland  wrote:
> > 
> > > At Linux Plumbers, I had a conversation with Steve Rostedt, and we came
> > > to the conclusion that (withut heavyweight synchronization) patching two
> > > NOPs at runtime isn't safe, since a CPU might have executed the first
> > > NOP as a NOP before another CPU patches both instructions. So a CPU
> > > might execute:
> > > 
> > >   NOP
> > >   BL  ftrace_regs_caller
> > > 
> > > ... rather than the expected:
> > > 
> > >   MOV X9, X30
> > >   BL  ftrace_regs_caller
> > > 
> > > ... and therefore X9 contains some UNKNOWN value, rather than the
> > > original LR value.
> 
> I'm perfectly aware of that; an earlier version had barriers, attempting
> to avoid just that, which Mark(?) wrote weren't neccessary.
> 
> But is this a realistic scenario? All function entries are aligned 8 bytes.
> Are there arm64 implementations out there that fetch only 4 bytes and
> give a chance to mess with the 2nd 4 bytes? You at arm.com should know, and
> I won't be surprised if the answer is a weird "yes". Or maybe it's just
> another erratum lurking somewhere...
> 
> My point is: those 2 insn will _never_ be split by any alignment
> boundary > 8; does that mean anything, have you considered this?

Forget that. Steve mentioned the keyword *interrupt*, which creates a
completely different situation. In short, only the instruction pointer
will be saved; and i-cache and pipeline will be freshly reloaded on return,
so this threat is highly unlikely (interrupt taken exactly after 1st nop),
but not impossible. "Puking horses..." as we say in German.

> > > I wonder if we could solve that by patching the kernel at build-time, to
> > > add the MOV X9, X30 in place of the first NOP. If we were to do that, we
> > > could also update the addresses to pooint at the second NOP, simplifying
> > > the changes to the runtime code.
> > 
> > You can also patch it at boot up when there's only one CPU running, and
> > interrupts are disabled.
> 
> May I remind about possible performance hits? Even the NOPs had a tiny impact
> on certain in-order implementations. I'd rather switch between the mov and
> a "b +2".

This one however still holds.

Torsten



Re: [PATCH v6] arm64: implement ftrace with regs

2019-01-04 Thread Torsten Duwe
On Fri, Jan 04, 2019 at 01:06:48PM -0500, Steven Rostedt wrote:
> On Fri, 4 Jan 2019 17:50:18 +
> Mark Rutland  wrote:
> 
> > At Linux Plumbers, I had a conversation with Steve Rostedt, and we came
> > to the conclusion that (withut heavyweight synchronization) patching two
> > NOPs at runtime isn't safe, since a CPU might have executed the first
> > NOP as a NOP before another CPU patches both instructions. So a CPU
> > might execute:
> > 
> > NOP
> > BL  ftrace_regs_caller
> > 
> > ... rather than the expected:
> > 
> > MOV X9, X30
> > BL  ftrace_regs_caller
> > 
> > ... and therefore X9 contains some UNKNOWN value, rather than the
> > original LR value.

I'm perfectly aware of that; an earlier version had barriers, attempting
to avoid just that, which Mark(?) wrote weren't neccessary.

But is this a realistic scenario? All function entries are aligned 8 bytes.
Are there arm64 implementations out there that fetch only 4 bytes and
give a chance to mess with the 2nd 4 bytes? You at arm.com should know, and
I won't be surprised if the answer is a weird "yes". Or maybe it's just
another erratum lurking somewhere...

My point is: those 2 insn will _never_ be split by any alignment
boundary > 8; does that mean anything, have you considered this?

> > I wonder if we could solve that by patching the kernel at build-time, to
> > add the MOV X9, X30 in place of the first NOP. If we were to do that, we
> > could also update the addresses to pooint at the second NOP, simplifying
> > the changes to the runtime code.
> 
> You can also patch it at boot up when there's only one CPU running, and
> interrupts are disabled.

May I remind about possible performance hits? Even the NOPs had a tiny impact
on certain in-order implementations. I'd rather switch between the mov and
a "b +2".

Torsten



Re: [PATCH v5] arm64: implement ftrace with regs

2019-01-04 Thread Torsten Duwe
On Mon, Dec 17, 2018 at 09:52:04AM +0530, Amit Daniel Kachhap wrote:
> There is no error message or crash but no useful output like below,
> 
> /sys/kernel/tracing # echo wake_up_process > set_graph_function
> /sys/kernel/tracing # echo function_graph > current_tracer
> /sys/kernel/tracing # cat trace
> # tracer: function_graph
> #
> # CPU  DURATION  FUNCTION CALLS
> # | |   | |   |   |   |

It turned out the graph caller works perfectly, I only broke the filtering.
Fixed now in v6; thanks a lot for testing!

Torsten



[PATCH v6] arm64: implement ftrace with regs

2019-01-04 Thread Torsten Duwe
Use -fpatchable-function-entry (gcc8) to add 2 NOPs at the beginning
of each function. Replace the first NOP thus generated with a quick LR
saver (move it to scratch reg x9), so the 2nd replacement insn, the call
to ftrace, does not clobber the value. Ftrace will then generate the
standard stack frames.

Note that patchable-function-entry in GCC disables IPA-RA, which means
ABI register calling conventions are obeyed *and* scratch registers
such as x9 are available.

Introduce and handle an ftrace_regs_trampoline for module PLTs, right
after ftrace_trampoline, and double the size of this special section.

Signed-off-by: Torsten Duwe 

---

This patch applies on 4.20 with the additional changes
bdb85cd1d20669dfae813555dddb745ad09323ba
(arm64/module: switch to ADRP/ADD sequences for PLT entries)
and
7dc48bf96aa0fc8aa5b38cc3e5c36ac03171e680
(arm64: ftrace: always pass instrumented pc in x0)
along with their respective series, or alternatively on Linus' master,
which already has these.

changes since v5:

* fix mentioned pc in x0 to hold the start address of the call site,
  not the return address or the branch address.
  This resolves the problem found by Amit.

---
 arch/arm64/Kconfig|2 
 arch/arm64/Makefile   |4 +
 arch/arm64/include/asm/assembler.h|1 
 arch/arm64/include/asm/ftrace.h   |   13 +++
 arch/arm64/include/asm/module.h   |3 
 arch/arm64/kernel/Makefile|6 -
 arch/arm64/kernel/entry-ftrace.S  |  131 ++
 arch/arm64/kernel/ftrace.c|  125 
 arch/arm64/kernel/module-plts.c   |3 
 arch/arm64/kernel/module.c|2 
 drivers/firmware/efi/libstub/Makefile |3 
 include/asm-generic/vmlinux.lds.h |1 
 include/linux/compiler_types.h|4 +
 13 files changed, 262 insertions(+), 36 deletions(-)
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -131,6 +131,8 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS \
+   if $(cc-option,-fpatchable-function-entry=2)
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -79,6 +79,10 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
 KBUILD_LDFLAGS_MODULE  += -T $(srctree)/arch/arm64/kernel/module.lds
 endif
 
+ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y)
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
+
 # Default value
 head-y := arch/arm64/kernel/head.o
 
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -17,6 +17,19 @@
 #define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+/*
+ * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
+ * of each function, with the second NOP actually calling ftrace. In contrary
+ * to a classic _mcount call, the call instruction to be modified is thus
+ * the second one, and not the only one.
+ */
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
+#else
+#define REC_IP_BRANCH_OFFSET 0
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$(
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
 
-CFLAGS_REMOVE_ftrace.o = -pg
-CFLAGS_REMOVE_insn.o = -pg
-CFLAGS_REMOVE_return_address.o = -pg
+CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE)
 
 # Object file lists.
 arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o  
\
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -13,7 +13,8 @@ cflags-$(CONFIG_X86)  += -m$(BITS) -D__K
 
 # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
 # disable the stackleak plugin
-cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie \
+cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\
+ ,$(KBUILD_CFLAGS)) -fpie \
   $(DISABLE_STACKLEAK_PLUGIN)
 cflags-$(CONFIG_ARM)   := $(subst -pg,,$(KBUILD_CFLAGS)) \
   -fno-builtin -fpic \
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Gcc with -pg will put the following code in the beginning of each function:
@@ -122,6 +124,7 @@ skip_ftrace_call:   // }
 ENDPROC(_mcount)
 
 #else /* CONFIG_DYNAMIC_FTRACE */
+#ifndef

Re: [PATCH v5] arm64: implement ftrace with regs

2018-12-17 Thread Torsten Duwe
On Mon, Dec 17, 2018 at 09:52:04AM +0530, Amit Daniel Kachhap wrote:
> There is no error message or crash but no useful output like below,
> 
> /sys/kernel/tracing # echo wake_up_process > set_graph_function
> /sys/kernel/tracing # echo function_graph > current_tracer
> /sys/kernel/tracing # cat trace
> # tracer: function_graph
> #
> # CPU  DURATION  FUNCTION CALLS
> # | |   | |   |   |   |

Confirmed. The graph tracer self-test passes, but when used, there is no
output.

Torsten



Re: [PATCH v5] arm64: implement ftrace with regs

2018-12-15 Thread Torsten Duwe
On Fri, 14 Dec 2018 21:45:03 +0530
Amit Daniel Kachhap  wrote:

> Sorry I didn't mention my environment. I am using 4.20-rc3 and it has
> all the above 8 extra patches
> mentioned by you.

So that should be fine.

> I read your change description in v3 patchset. You had mentioned there
> that graph caller
> is broken.

No, actually I thought I had fixed that for v4. Would you care to show
us an actual error message or some symptom?

Torsten


Re: [PATCH v5] arm64: implement ftrace with regs

2018-12-14 Thread Torsten Duwe
On Thu, Dec 13, 2018 at 11:01:38PM +0530, Amit Daniel Kachhap wrote:
> On Fri, Nov 30, 2018 at 9:53 PM Torsten Duwe  wrote:
> 
> Hi Torsten,
> 
> I was testing your patch and it seems to work fine for function trace. However
> function graph trace seems broken. Is it work in progress ? or I am
> missing something.

What did you base your tests on, you didn't specify?
OTOH neither did I ;-) I precluded all addressees had the full picture.

Precisely, I basically start with 4.19, but 4.20-rc shouldn't make a
difference. BUT...

> > [Changes from v4]
> >
> > * include Ard's feedback and pending changes:
> >   - ADR/ADRP veneers
> >   - ftrace_trampolines[2]
> >   - add a .req for fp, just in case
" [PATCH 1/2] arm64/insn: add support for emitting ADR/ADRP instructions "
<20181122084646.3247-2-ard.biesheu...@linaro.org> et.al, esp:
Git-commit: bdb85cd1d20669dfae813555dddb745ad09323ba

> >   - comment on the pt_regs.stackframe[2] mapping
> > * include Mark's ftrace cleanup
> >   - GLOBAL()
> >   - prepare_ftrace_return(pc, , fp)
> >
" [PATCH 1/6] linkage: add generic GLOBAL() macro "
<20181115224203.24847-2-mark.rutl...@arm.com> et.al., esp:
Git-commit: 7dc48bf96aa0fc8aa5b38cc3e5c36ac03171e680

change the API this patch set relies on. AFAIU they are on their way into
mainline so I updated v5 accordingly. If you don't have these, just use v4;
the other changes are only for compatibility and cosmetics.

HTH,
Torsten



[PATCH v5] arm64: implement ftrace with regs

2018-11-30 Thread Torsten Duwe
Use -fpatchable-function-entry (gcc8) to add 2 NOPs at the beginning
of each function. Replace the first NOP thus generated with a quick LR
saver (move it to scratch reg x9), so the 2nd replacement insn, the call
to ftrace, does not clobber the value. Ftrace will then generate the
standard stack frames.

Note that patchable-function-entry in GCC disables IPA-RA, which means
ABI register calling conventions are obeyed *and* scratch registers
such as x9 are available.

Introduce and handle an ftrace_regs_trampoline for module PLTs, together
with ftrace_trampoline, and double the size of this special section
if .text.ftrace_trampoline is present in the module.

Signed-off-by: Torsten Duwe 

---

As reliable stack traces are still being discussed, here is
dynamic ftrace with regs only, which has a value of its own.
I can see Mark has broken down an earlier version into smaller
patches; just let me know if you prefer that.

[Changes from v4]

* include Ard's feedback and pending changes:
  - ADR/ADRP veneers
  - ftrace_trampolines[2]
  - add a .req for fp, just in case
  - comment on the pt_regs.stackframe[2] mapping
* include Mark's ftrace cleanup
  - GLOBAL()
  - prepare_ftrace_return(pc, , fp)

---
 arch/arm64/Kconfig|2 
 arch/arm64/Makefile   |4 +
 arch/arm64/include/asm/assembler.h|1 
 arch/arm64/include/asm/ftrace.h   |   13 +++
 arch/arm64/include/asm/module.h   |3 
 arch/arm64/kernel/Makefile|6 -
 arch/arm64/kernel/entry-ftrace.S  |  130 ++
 arch/arm64/kernel/ftrace.c|  125 +---
 arch/arm64/kernel/module-plts.c   |3 
 arch/arm64/kernel/module.c|2 
 drivers/firmware/efi/libstub/Makefile |3 
 include/asm-generic/vmlinux.lds.h |1 
 include/linux/compiler_types.h|4 +
 13 files changed, 261 insertions(+), 36 deletions(-)
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -125,6 +125,8 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS \
+   if $(cc-option,-fpatchable-function-entry=2)
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -79,6 +79,10 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
 KBUILD_LDFLAGS_MODULE  += -T $(srctree)/arch/arm64/kernel/module.lds
 endif
 
+ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y)
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
+
 # Default value
 head-y := arch/arm64/kernel/head.o
 
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -17,6 +17,19 @@
 #define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+/*
+ * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
+ * of each function, with the second NOP actually calling ftrace. In contrary
+ * to a classic _mcount call, the call instruction to be modified is thus
+ * the second one, and not the only one.
+ */
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
+#else
+#define REC_IP_BRANCH_OFFSET 0
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$(
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
 
-CFLAGS_REMOVE_ftrace.o = -pg
-CFLAGS_REMOVE_insn.o = -pg
-CFLAGS_REMOVE_return_address.o = -pg
+CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE)
 
 # Object file lists.
 arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o  
\
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -13,7 +13,8 @@ cflags-$(CONFIG_X86)  += -m$(BITS) -D__K
 
 # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
 # disable the stackleak plugin
-cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie \
+cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\
+ ,$(KBUILD_CFLAGS)) -fpie \
   $(DISABLE_STACKLEAK_PLUGIN)
 cflags-$(CONFIG_ARM)   := $(subst -pg,,$(KBUILD_CFLAGS)) \
   -fno-builtin -fpic -mno-single-pic-base
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Gcc with -pg will put the following code in the beginning of each function:
@@ -122,6 +124,7 @@ skip_ftrace_call:   // }
 ENDPROC(_mcount)
 
 #else

[PATCH v5] arm64: implement ftrace with regs

2018-11-30 Thread Torsten Duwe
Use -fpatchable-function-entry (gcc8) to add 2 NOPs at the beginning
of each function. Replace the first NOP thus generated with a quick LR
saver (move it to scratch reg x9), so the 2nd replacement insn, the call
to ftrace, does not clobber the value. Ftrace will then generate the
standard stack frames.

Note that patchable-function-entry in GCC disables IPA-RA, which means
ABI register calling conventions are obeyed *and* scratch registers
such as x9 are available.

Introduce and handle an ftrace_regs_trampoline for module PLTs, together
with ftrace_trampoline, and double the size of this special section
if .text.ftrace_trampoline is present in the module.

Signed-off-by: Torsten Duwe 

---

As reliable stack traces are still being discussed, here is
dynamic ftrace with regs only, which has a value of its own.
I can see Mark has broken down an earlier version into smaller
patches; just let me know if you prefer that.

[Changes from v4]

* include Ard's feedback and pending changes:
  - ADR/ADRP veneers
  - ftrace_trampolines[2]
  - add a .req for fp, just in case
  - comment on the pt_regs.stackframe[2] mapping
* include Mark's ftrace cleanup
  - GLOBAL()
  - prepare_ftrace_return(pc, , fp)

---
 arch/arm64/Kconfig|2 
 arch/arm64/Makefile   |4 +
 arch/arm64/include/asm/assembler.h|1 
 arch/arm64/include/asm/ftrace.h   |   13 +++
 arch/arm64/include/asm/module.h   |3 
 arch/arm64/kernel/Makefile|6 -
 arch/arm64/kernel/entry-ftrace.S  |  130 ++
 arch/arm64/kernel/ftrace.c|  125 +---
 arch/arm64/kernel/module-plts.c   |3 
 arch/arm64/kernel/module.c|2 
 drivers/firmware/efi/libstub/Makefile |3 
 include/asm-generic/vmlinux.lds.h |1 
 include/linux/compiler_types.h|4 +
 13 files changed, 261 insertions(+), 36 deletions(-)
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -125,6 +125,8 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS \
+   if $(cc-option,-fpatchable-function-entry=2)
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -79,6 +79,10 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
 KBUILD_LDFLAGS_MODULE  += -T $(srctree)/arch/arm64/kernel/module.lds
 endif
 
+ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y)
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
+
 # Default value
 head-y := arch/arm64/kernel/head.o
 
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -17,6 +17,19 @@
 #define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+/*
+ * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
+ * of each function, with the second NOP actually calling ftrace. In contrary
+ * to a classic _mcount call, the call instruction to be modified is thus
+ * the second one, and not the only one.
+ */
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
+#else
+#define REC_IP_BRANCH_OFFSET 0
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$(
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
 
-CFLAGS_REMOVE_ftrace.o = -pg
-CFLAGS_REMOVE_insn.o = -pg
-CFLAGS_REMOVE_return_address.o = -pg
+CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE)
 
 # Object file lists.
 arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o  
\
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -13,7 +13,8 @@ cflags-$(CONFIG_X86)  += -m$(BITS) -D__K
 
 # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
 # disable the stackleak plugin
-cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie \
+cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\
+ ,$(KBUILD_CFLAGS)) -fpie \
   $(DISABLE_STACKLEAK_PLUGIN)
 cflags-$(CONFIG_ARM)   := $(subst -pg,,$(KBUILD_CFLAGS)) \
   -fno-builtin -fpic -mno-single-pic-base
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Gcc with -pg will put the following code in the beginning of each function:
@@ -122,6 +124,7 @@ skip_ftrace_call:   // }
 ENDPROC(_mcount)
 
 #else

Re: [PATCH v4 1/3] arm64: implement ftrace with regs

2018-11-12 Thread Torsten Duwe
On Thu, Nov 08, 2018 at 01:12:42PM +0100, Ard Biesheuvel wrote:
> 
> On 26 October 2018 at 16:21, Torsten Duwe  wrote:
> > @@ -162,6 +165,114 @@ ftrace_graph_call:// 
> > ftrace_graph_cal
> >
> > mcount_exit
> >  ENDPROC(ftrace_caller)
> > +#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> > +
> > +/*
> > + * Since no -pg or similar compiler flag is used, there should really be
> > + * no reference to _mcount; so do not define one. Only some value for
> > + * MCOUNT_ADDR is needed for comparison. Let it point here to have some
> > + * sort of magic value that can be recognised when debugging.
> > + */
> > +   .global _mcount
> > +_mcount:
> > +   ret /* make it differ from regs caller */
> > +
> > +ENTRY(ftrace_regs_caller)
> > +   /* callee's preliminary stack frame: */
> > +   stp fp, x9, [sp, #-16]!
> 
> Does the 'fp' alias for x29 work with older assemblers? I guess it
> does not matter gor GCC 8+ code, but be careful when you rewrite
> existing stuff.

I had gotten the impression the fp alias was there ever since, so I used
it for readability. Thanks for the notification, I'll double check.

> > +   mov fp, sp
> > +
> > +   /* our stack frame: */
> > +   stp fp, lr, [sp, #-S_FRAME_SIZE]!
> 
> If sizeof(struct pt_regs) == S_FRAME_SIZE), you should subtract 16
> additional bytes here

This is intentional :-]

At the end of pt_regs there's a "stackframe", which now aligns with the
"preliminary" frame I create for the callee. Please tell me what the struct
member is good for if not for an actual callee stack frame...
I thought it was a neat idea.

> > +
> > +ftrace_common:
> > +   /*
> > +* At this point we have 2 new stack frames, and x9 pointing
> > +* at a pt_regs which we can populate as needed.
> > +*/
> > +
> > +   /* save function arguments */
> > +   stp x0, x1, [x9]
> > +   stp x2, x3, [x9, #S_X2]
> > +   stp x4, x5, [x9, #S_X4]
> > +   stp x6, x7, [x9, #S_X6]
> > +   stp x8, x9, [x9, #S_X8]
> > +
> 
> x9 is not a function argument, and if it were, you'd have clobbered it
> by now. Please use a single 'str' and store x8 only

This way the x9 slot in pt_regs will be undefined. Is that ok with everybody?

> > +ftrace_common_return:
> > +   add x9, sp, #16 /* advance to pt_regs for restore */
> > +
> > +   ldp x0, x1, [x9]
> > +   ldp x2, x3, [x9, #S_X2]
> > +   ldp x4, x5, [x9, #S_X4]
> > +   ldp x6, x7, [x9, #S_X6]
> > +   ldp x8, x9, [x9, #S_X8]
> > +
> 
> Same as above. It also deserves a mention that you are relying on the
> absence of IPA-RA, and so x9..x18 are guaranteed to be dead at
> function entry, and so they don't need to be restored here.

Sure, I can quote some ABI spec here :-/
I just wish all arm code was such well documented.

> > --- a/arch/arm64/kernel/ftrace.c
> > +++ b/arch/arm64/kernel/ftrace.c
> > @@ -65,18 +65,61 @@ int ftrace_update_ftrace_func(ftrace_fun
> > return ftrace_modify_code(pc, 0, new, false);
> >  }
> >
> > +#ifdef CONFIG_ARM64_MODULE_PLTS
> > +static int install_ftrace_trampoline(struct module *mod, unsigned long 
> > *addr)
> > +{
> > +   struct plt_entry trampoline, *mod_trampoline;
> > +   trampoline = get_plt_entry(*addr);
> > +
> > +   if (*addr == FTRACE_ADDR)
> > +   mod_trampoline = mod->arch.ftrace_trampoline;
> > +   else if (*addr == FTRACE_REGS_ADDR)
> > +   mod_trampoline = mod->arch.ftrace_regs_trampoline;
> 
> Could we do something like
> 
> if (*addr == FTRACE_ADDR)
> mod_trampoline = >arch.ftrace_trampoline[0];
> else if (*addr == FTRACE_REGS_ADDR)
> mod_trampoline = >arch.ftrace_trampoline[1];
> 
> and get rid of the additional struct field and pointer?

"0" and "1" won't make it obvious which one has the regs tracing, but besides
that, I like the idea of making this a small array. Other opinions?

Torsten



Re: [PATCH v4 1/3] arm64: implement ftrace with regs

2018-11-12 Thread Torsten Duwe
On Thu, Nov 08, 2018 at 01:12:42PM +0100, Ard Biesheuvel wrote:
> 
> On 26 October 2018 at 16:21, Torsten Duwe  wrote:
> > @@ -162,6 +165,114 @@ ftrace_graph_call:// 
> > ftrace_graph_cal
> >
> > mcount_exit
> >  ENDPROC(ftrace_caller)
> > +#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> > +
> > +/*
> > + * Since no -pg or similar compiler flag is used, there should really be
> > + * no reference to _mcount; so do not define one. Only some value for
> > + * MCOUNT_ADDR is needed for comparison. Let it point here to have some
> > + * sort of magic value that can be recognised when debugging.
> > + */
> > +   .global _mcount
> > +_mcount:
> > +   ret /* make it differ from regs caller */
> > +
> > +ENTRY(ftrace_regs_caller)
> > +   /* callee's preliminary stack frame: */
> > +   stp fp, x9, [sp, #-16]!
> 
> Does the 'fp' alias for x29 work with older assemblers? I guess it
> does not matter gor GCC 8+ code, but be careful when you rewrite
> existing stuff.

I had gotten the impression the fp alias was there ever since, so I used
it for readability. Thanks for the notification, I'll double check.

> > +   mov fp, sp
> > +
> > +   /* our stack frame: */
> > +   stp fp, lr, [sp, #-S_FRAME_SIZE]!
> 
> If sizeof(struct pt_regs) == S_FRAME_SIZE), you should subtract 16
> additional bytes here

This is intentional :-]

At the end of pt_regs there's a "stackframe", which now aligns with the
"preliminary" frame I create for the callee. Please tell me what the struct
member is good for if not for an actual callee stack frame...
I thought it was a neat idea.

> > +
> > +ftrace_common:
> > +   /*
> > +* At this point we have 2 new stack frames, and x9 pointing
> > +* at a pt_regs which we can populate as needed.
> > +*/
> > +
> > +   /* save function arguments */
> > +   stp x0, x1, [x9]
> > +   stp x2, x3, [x9, #S_X2]
> > +   stp x4, x5, [x9, #S_X4]
> > +   stp x6, x7, [x9, #S_X6]
> > +   stp x8, x9, [x9, #S_X8]
> > +
> 
> x9 is not a function argument, and if it were, you'd have clobbered it
> by now. Please use a single 'str' and store x8 only

This way the x9 slot in pt_regs will be undefined. Is that ok with everybody?

> > +ftrace_common_return:
> > +   add x9, sp, #16 /* advance to pt_regs for restore */
> > +
> > +   ldp x0, x1, [x9]
> > +   ldp x2, x3, [x9, #S_X2]
> > +   ldp x4, x5, [x9, #S_X4]
> > +   ldp x6, x7, [x9, #S_X6]
> > +   ldp x8, x9, [x9, #S_X8]
> > +
> 
> Same as above. It also deserves a mention that you are relying on the
> absence of IPA-RA, and so x9..x18 are guaranteed to be dead at
> function entry, and so they don't need to be restored here.

Sure, I can quote some ABI spec here :-/
I just wish all arm code was such well documented.

> > --- a/arch/arm64/kernel/ftrace.c
> > +++ b/arch/arm64/kernel/ftrace.c
> > @@ -65,18 +65,61 @@ int ftrace_update_ftrace_func(ftrace_fun
> > return ftrace_modify_code(pc, 0, new, false);
> >  }
> >
> > +#ifdef CONFIG_ARM64_MODULE_PLTS
> > +static int install_ftrace_trampoline(struct module *mod, unsigned long 
> > *addr)
> > +{
> > +   struct plt_entry trampoline, *mod_trampoline;
> > +   trampoline = get_plt_entry(*addr);
> > +
> > +   if (*addr == FTRACE_ADDR)
> > +   mod_trampoline = mod->arch.ftrace_trampoline;
> > +   else if (*addr == FTRACE_REGS_ADDR)
> > +   mod_trampoline = mod->arch.ftrace_regs_trampoline;
> 
> Could we do something like
> 
> if (*addr == FTRACE_ADDR)
> mod_trampoline = >arch.ftrace_trampoline[0];
> else if (*addr == FTRACE_REGS_ADDR)
> mod_trampoline = >arch.ftrace_trampoline[1];
> 
> and get rid of the additional struct field and pointer?

"0" and "1" won't make it obvious which one has the regs tracing, but besides
that, I like the idea of making this a small array. Other opinions?

Torsten



Re: [PATCH v4 2/3] arm64: implement live patching

2018-11-12 Thread Torsten Duwe
On Thu, Nov 08, 2018 at 01:42:35PM +0100, Ard Biesheuvel wrote:
> On 26 October 2018 at 16:21, Torsten Duwe  wrote:
> > /* The program counter just after the ftrace call site */
> > str lr, [x9, #S_PC]
> > +
> > /* The stack pointer as it was on ftrace_caller entry... */
> > add x28, fp, #16
> > str x28, [x9, #S_SP]
> 
> Please drop this hunk

Sure. I missed that one during cleanup.

> > @@ -233,6 +234,10 @@ ftrace_common:
 ^^
> > ldr x28, [fp, 8]
> > str x28, [x9, #S_LR]/* to pt_regs.r[30] */
> >
> > +#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
> > +   mov x28, lr /* remember old return address */
> > +#endif
> > +
> > ldr_l   x2, function_trace_op, x0
> > ldr x1, [fp, #8]
> > sub x0, lr, #8  /* function entry == IP */
> > @@ -245,6 +250,17 @@ ftrace_call:
> >
> > bl  ftrace_stub
> >
> > +#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
> > +   /* Is the trace function a live patcher an has messed with
> > +* the return address?
> > +*/
> > +   add x9, sp, #16 /* advance to pt_regs for restore */
> > +   ldr x0, [x9, #S_PC]
> > +   cmp x0, x28 /* compare with the value we remembered */
> > +   /* to not call graph tracer's "call" mechanism twice! */
> > +   b.neftrace_common_return
> 
> Is ftrace_common_return guaranteed to be in range? Conditional
> branches have only -/+ 1 MB range IIRC.

It's the same function. A "1f" would do the same job, but the long label
is a talking identifier that saves a comment. I'd more be worried about
the return from the graph trace caller, which happens to be the _next_
function ;-)

If ftrace_caller or graph_caller grow larger than a meg, something else is
_very_ wrong.

> > +#endif
> > +
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> 
> Can we fold these #ifdef blocks together (i.e, incorporate the
> conditional livepatch sequence here)

I'll see how to make it fit. But remember some people might want ftrace
but no live patching capability.

Thanks for the review!

Torsten


Re: [PATCH v4 2/3] arm64: implement live patching

2018-11-12 Thread Torsten Duwe
On Thu, Nov 08, 2018 at 01:42:35PM +0100, Ard Biesheuvel wrote:
> On 26 October 2018 at 16:21, Torsten Duwe  wrote:
> > /* The program counter just after the ftrace call site */
> > str lr, [x9, #S_PC]
> > +
> > /* The stack pointer as it was on ftrace_caller entry... */
> > add x28, fp, #16
> > str x28, [x9, #S_SP]
> 
> Please drop this hunk

Sure. I missed that one during cleanup.

> > @@ -233,6 +234,10 @@ ftrace_common:
 ^^
> > ldr x28, [fp, 8]
> > str x28, [x9, #S_LR]/* to pt_regs.r[30] */
> >
> > +#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
> > +   mov x28, lr /* remember old return address */
> > +#endif
> > +
> > ldr_l   x2, function_trace_op, x0
> > ldr x1, [fp, #8]
> > sub x0, lr, #8  /* function entry == IP */
> > @@ -245,6 +250,17 @@ ftrace_call:
> >
> > bl  ftrace_stub
> >
> > +#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
> > +   /* Is the trace function a live patcher an has messed with
> > +* the return address?
> > +*/
> > +   add x9, sp, #16 /* advance to pt_regs for restore */
> > +   ldr x0, [x9, #S_PC]
> > +   cmp x0, x28 /* compare with the value we remembered */
> > +   /* to not call graph tracer's "call" mechanism twice! */
> > +   b.neftrace_common_return
> 
> Is ftrace_common_return guaranteed to be in range? Conditional
> branches have only -/+ 1 MB range IIRC.

It's the same function. A "1f" would do the same job, but the long label
is a talking identifier that saves a comment. I'd more be worried about
the return from the graph trace caller, which happens to be the _next_
function ;-)

If ftrace_caller or graph_caller grow larger than a meg, something else is
_very_ wrong.

> > +#endif
> > +
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> 
> Can we fold these #ifdef blocks together (i.e, incorporate the
> conditional livepatch sequence here)

I'll see how to make it fit. But remember some people might want ftrace
but no live patching capability.

Thanks for the review!

Torsten


Re: [PATCH v4 1/3] arm64: implement ftrace with regs

2018-10-31 Thread Torsten Duwe
On Wed, 31 Oct 2018 14:18:19 +
Mark Rutland  wrote:

> On Wed, Oct 31, 2018 at 02:19:07PM +0100, Jiri Kosina wrote:

> > Other architectures do rely on that. That's exactly for example why
> > on x86 we use '-pg -mfentry', to make sure we hook the function
> > *before* prologue.  
> 
> Ah, I'd missed -mfentry for x86. I now see that's also the case with
> __gnu_mcount_nc on arch/arm, so that covers my confusion.

Yes, fentry used to be the prerequisite, but it's everything but
portable. PPC64 already had the profile-kernel switch, which was
becoming just usable as we got at live patching.

I'm hoping that the patchable-function-entry will become the future
de-facto standard.

Torsten


Re: [PATCH v4 1/3] arm64: implement ftrace with regs

2018-10-31 Thread Torsten Duwe
On Wed, 31 Oct 2018 14:18:19 +
Mark Rutland  wrote:

> On Wed, Oct 31, 2018 at 02:19:07PM +0100, Jiri Kosina wrote:

> > Other architectures do rely on that. That's exactly for example why
> > on x86 we use '-pg -mfentry', to make sure we hook the function
> > *before* prologue.  
> 
> Ah, I'd missed -mfentry for x86. I now see that's also the case with
> __gnu_mcount_nc on arch/arm, so that covers my confusion.

Yes, fentry used to be the prerequisite, but it's everything but
portable. PPC64 already had the profile-kernel switch, which was
becoming just usable as we got at live patching.

I'm hoping that the patchable-function-entry will become the future
de-facto standard.

Torsten


[PATCH v4 1/3] arm64: implement ftrace with regs

2018-10-26 Thread Torsten Duwe
Use -fpatchable-function-entry (gcc8) to add 2 NOPs at the beginning
of each function. Replace the first NOP thus generated with a quick LR
saver (move it to scratch reg x9), so the 2nd replacement insn, the call
to ftrace, does not clobber the value. Ftrace will then generate the
standard stack frames.

Note that patchable-function-entry in GCC disables IPA-RA, which means
ABI register calling conventions are obeyed *and* scratch registers
such as x9 are available.

Introduce and handle an ftrace_regs_trampoline for module PLTs, right
after ftrace_trampoline, and double the size of this special section
if .text.ftrace_trampoline is present in the module.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -110,6 +110,8 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS \
+   if $(cc-option,-fpatchable-function-entry=2)
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -78,6 +78,10 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
 KBUILD_LDFLAGS_MODULE  += -T $(srctree)/arch/arm64/kernel/module.lds
 endif
 
+ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y)
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
+
 # Default value
 head-y := arch/arm64/kernel/head.o
 
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -16,6 +16,19 @@
 #define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+/*
+ * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
+ * of each function, with the second NOP actually calling ftrace. In contrary
+ * to a classic _mcount call, the call instruction to be modified is thus
+ * the second one, and not the only one.
+ */
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
+#else
+#define REC_IP_BRANCH_OFFSET 0
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$(
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
 
-CFLAGS_REMOVE_ftrace.o = -pg
-CFLAGS_REMOVE_insn.o = -pg
-CFLAGS_REMOVE_return_address.o = -pg
+CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE)
 
 # Object file lists.
 arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o  
\
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -11,7 +11,8 @@ cflags-$(CONFIG_X86)  += -m$(BITS) -D__K
   -fPIC -fno-strict-aliasing -mno-red-zone \
   -mno-mmx -mno-sse -fshort-wchar
 
-cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie
+cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\
+ ,$(KBUILD_CFLAGS)) -fpie
 cflags-$(CONFIG_ARM)   := $(subst -pg,,$(KBUILD_CFLAGS)) \
   -fno-builtin -fpic -mno-single-pic-base
 
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Gcc with -pg will put the following code in the beginning of each function:
@@ -123,6 +125,7 @@ skip_ftrace_call:   // }
 ENDPROC(_mcount)
 
 #else /* CONFIG_DYNAMIC_FTRACE */
+#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 /*
  * _mcount() is used to build the kernel with -pg option, but all the branch
  * instructions to _mcount() are replaced to NOP initially at kernel start up,
@@ -162,6 +165,114 @@ ftrace_graph_call:// 
ftrace_graph_cal
 
mcount_exit
 ENDPROC(ftrace_caller)
+#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
+
+/*
+ * Since no -pg or similar compiler flag is used, there should really be
+ * no reference to _mcount; so do not define one. Only some value for
+ * MCOUNT_ADDR is needed for comparison. Let it point here to have some
+ * sort of magic value that can be recognised when debugging.
+ */
+   .global _mcount
+_mcount:
+   ret /* make it differ from regs caller */
+
+ENTRY(ftrace_regs_caller)
+   /* callee's preliminary stack frame: */
+   stp fp, x9, [sp, #-16]!
+   mov fp, sp
+
+   /* our stack frame: */
+   stp fp, lr, [sp, #-S_FRAME_SIZE]!
+   add x9, sp, #16 /* offset to pt_regs */
+
+   stp x10, x11, [x9, #S_X10]
+   stp x12, x13, [x9, #S_X12]
+   stp x14, x15, [x9, #S_X14]
+   stp x16, x17, [x9, #S_X16]
+   stp x18, x19, [x9, #S_X18]
+   stp x20, x21

[PATCH v4 2/3] arm64: implement live patching

2018-10-26 Thread Torsten Duwe
Based on ftrace with regs, do the usual thing.
(see Documentation/livepatch/livepatch.txt)

Use task flag bit 6 to track patch transisiton state for the consistency
model. Add it to the work mask so it gets cleared on all kernel exits to
userland.

Tell livepatch regs->pc is the place to change the return address.
Make sure the graph tracer call hook is only called on the final function
entry in case regs->pc gets modified after an interception.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -120,6 +120,7 @@ config ARM64
select HAVE_GENERIC_DMA_COHERENT
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING
+   select HAVE_LIVEPATCH
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP if NUMA
select HAVE_NMI
@@ -1350,4 +1351,6 @@ if CRYPTO
 source "arch/arm64/crypto/Kconfig"
 endif
 
+source "kernel/livepatch/Kconfig"
+
 source "lib/Kconfig"
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -76,6 +76,7 @@ void arch_release_task_struct(struct tas
 #define TIF_FOREIGN_FPSTATE3   /* CPU's FP state is not current's */
 #define TIF_UPROBE 4   /* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK5   /* Check FS is USER_DS on return */
+#define TIF_PATCH_PENDING  6
 #define TIF_NOHZ   7
 #define TIF_SYSCALL_TRACE  8
 #define TIF_SYSCALL_AUDIT  9
@@ -94,6 +95,7 @@ void arch_release_task_struct(struct tas
 #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
 #define _TIF_FOREIGN_FPSTATE   (1 << TIF_FOREIGN_FPSTATE)
+#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING)
 #define _TIF_NOHZ  (1 << TIF_NOHZ)
 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
 #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
@@ -106,7 +108,8 @@ void arch_release_task_struct(struct tas
 
 #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-_TIF_UPROBE | _TIF_FSCHECK)
+_TIF_UPROBE | _TIF_FSCHECK | \
+_TIF_PATCH_PENDING)
 
 #define _TIF_SYSCALL_WORK  (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
--- /dev/null
+++ b/arch/arm64/include/asm/livepatch.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * livepatch.h - arm64-specific Kernel Live Patching Core
+ *
+ * Copyright (C) 2016,2018 SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef _ASM_ARM64_LIVEPATCH_H
+#define _ASM_ARM64_LIVEPATCH_H
+
+#include 
+
+static inline int klp_check_compiler_support(void)
+{
+   return 0;
+}
+
+static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
+{
+   regs->pc = ip;
+}
+
+#endif /* _ASM_ARM64_LIVEPATCH_H */
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -226,6 +226,7 @@ ftrace_common:
 
/* The program counter just after the ftrace call site */
str lr, [x9, #S_PC]
+
/* The stack pointer as it was on ftrace_caller entry... */
add x28, fp, #16
str x28, [x9, #S_SP]
@@ -233,6 +234,10 @@ ftrace_common:
ldr x28, [fp, 8]
str x28, [x9, #S_LR]/* to pt_regs.r[30] */
 
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   mov x28, lr /* remember old return address */
+#endif
+
ldr_l   x2, function_trace_op, x0
ldr x1, [fp, #8]
sub x0, lr, #8  /* function entry == IP */
@@ -245,6 +250,17 @@ ftrace_call:
 
bl  ftrace_stub
 
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   /* Is the trace function a live patcher an has messed with
+* the return address?
+*/
+   add x9, sp, #16 /* advance to pt_regs for restore */
+   ldr x0, [x9, #S_PC]
+   cmp x0, x28 /* compare with the value we remembered */
+   /* to not call graph tracer's "call" mechan

[PATCH v4 1/3] arm64: implement ftrace with regs

2018-10-26 Thread Torsten Duwe
Use -fpatchable-function-entry (gcc8) to add 2 NOPs at the beginning
of each function. Replace the first NOP thus generated with a quick LR
saver (move it to scratch reg x9), so the 2nd replacement insn, the call
to ftrace, does not clobber the value. Ftrace will then generate the
standard stack frames.

Note that patchable-function-entry in GCC disables IPA-RA, which means
ABI register calling conventions are obeyed *and* scratch registers
such as x9 are available.

Introduce and handle an ftrace_regs_trampoline for module PLTs, right
after ftrace_trampoline, and double the size of this special section
if .text.ftrace_trampoline is present in the module.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -110,6 +110,8 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS \
+   if $(cc-option,-fpatchable-function-entry=2)
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -78,6 +78,10 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
 KBUILD_LDFLAGS_MODULE  += -T $(srctree)/arch/arm64/kernel/module.lds
 endif
 
+ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y)
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
+
 # Default value
 head-y := arch/arm64/kernel/head.o
 
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -16,6 +16,19 @@
 #define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+/*
+ * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
+ * of each function, with the second NOP actually calling ftrace. In contrary
+ * to a classic _mcount call, the call instruction to be modified is thus
+ * the second one, and not the only one.
+ */
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
+#else
+#define REC_IP_BRANCH_OFFSET 0
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$(
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
 
-CFLAGS_REMOVE_ftrace.o = -pg
-CFLAGS_REMOVE_insn.o = -pg
-CFLAGS_REMOVE_return_address.o = -pg
+CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE)
 
 # Object file lists.
 arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o  
\
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -11,7 +11,8 @@ cflags-$(CONFIG_X86)  += -m$(BITS) -D__K
   -fPIC -fno-strict-aliasing -mno-red-zone \
   -mno-mmx -mno-sse -fshort-wchar
 
-cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie
+cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\
+ ,$(KBUILD_CFLAGS)) -fpie
 cflags-$(CONFIG_ARM)   := $(subst -pg,,$(KBUILD_CFLAGS)) \
   -fno-builtin -fpic -mno-single-pic-base
 
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Gcc with -pg will put the following code in the beginning of each function:
@@ -123,6 +125,7 @@ skip_ftrace_call:   // }
 ENDPROC(_mcount)
 
 #else /* CONFIG_DYNAMIC_FTRACE */
+#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 /*
  * _mcount() is used to build the kernel with -pg option, but all the branch
  * instructions to _mcount() are replaced to NOP initially at kernel start up,
@@ -162,6 +165,114 @@ ftrace_graph_call:// 
ftrace_graph_cal
 
mcount_exit
 ENDPROC(ftrace_caller)
+#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
+
+/*
+ * Since no -pg or similar compiler flag is used, there should really be
+ * no reference to _mcount; so do not define one. Only some value for
+ * MCOUNT_ADDR is needed for comparison. Let it point here to have some
+ * sort of magic value that can be recognised when debugging.
+ */
+   .global _mcount
+_mcount:
+   ret /* make it differ from regs caller */
+
+ENTRY(ftrace_regs_caller)
+   /* callee's preliminary stack frame: */
+   stp fp, x9, [sp, #-16]!
+   mov fp, sp
+
+   /* our stack frame: */
+   stp fp, lr, [sp, #-S_FRAME_SIZE]!
+   add x9, sp, #16 /* offset to pt_regs */
+
+   stp x10, x11, [x9, #S_X10]
+   stp x12, x13, [x9, #S_X12]
+   stp x14, x15, [x9, #S_X14]
+   stp x16, x17, [x9, #S_X16]
+   stp x18, x19, [x9, #S_X18]
+   stp x20, x21

[PATCH v4 2/3] arm64: implement live patching

2018-10-26 Thread Torsten Duwe
Based on ftrace with regs, do the usual thing.
(see Documentation/livepatch/livepatch.txt)

Use task flag bit 6 to track patch transisiton state for the consistency
model. Add it to the work mask so it gets cleared on all kernel exits to
userland.

Tell livepatch regs->pc is the place to change the return address.
Make sure the graph tracer call hook is only called on the final function
entry in case regs->pc gets modified after an interception.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -120,6 +120,7 @@ config ARM64
select HAVE_GENERIC_DMA_COHERENT
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING
+   select HAVE_LIVEPATCH
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP if NUMA
select HAVE_NMI
@@ -1350,4 +1351,6 @@ if CRYPTO
 source "arch/arm64/crypto/Kconfig"
 endif
 
+source "kernel/livepatch/Kconfig"
+
 source "lib/Kconfig"
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -76,6 +76,7 @@ void arch_release_task_struct(struct tas
 #define TIF_FOREIGN_FPSTATE3   /* CPU's FP state is not current's */
 #define TIF_UPROBE 4   /* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK5   /* Check FS is USER_DS on return */
+#define TIF_PATCH_PENDING  6
 #define TIF_NOHZ   7
 #define TIF_SYSCALL_TRACE  8
 #define TIF_SYSCALL_AUDIT  9
@@ -94,6 +95,7 @@ void arch_release_task_struct(struct tas
 #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
 #define _TIF_FOREIGN_FPSTATE   (1 << TIF_FOREIGN_FPSTATE)
+#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING)
 #define _TIF_NOHZ  (1 << TIF_NOHZ)
 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
 #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
@@ -106,7 +108,8 @@ void arch_release_task_struct(struct tas
 
 #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-_TIF_UPROBE | _TIF_FSCHECK)
+_TIF_UPROBE | _TIF_FSCHECK | \
+_TIF_PATCH_PENDING)
 
 #define _TIF_SYSCALL_WORK  (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
--- /dev/null
+++ b/arch/arm64/include/asm/livepatch.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * livepatch.h - arm64-specific Kernel Live Patching Core
+ *
+ * Copyright (C) 2016,2018 SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef _ASM_ARM64_LIVEPATCH_H
+#define _ASM_ARM64_LIVEPATCH_H
+
+#include 
+
+static inline int klp_check_compiler_support(void)
+{
+   return 0;
+}
+
+static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
+{
+   regs->pc = ip;
+}
+
+#endif /* _ASM_ARM64_LIVEPATCH_H */
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -226,6 +226,7 @@ ftrace_common:
 
/* The program counter just after the ftrace call site */
str lr, [x9, #S_PC]
+
/* The stack pointer as it was on ftrace_caller entry... */
add x28, fp, #16
str x28, [x9, #S_SP]
@@ -233,6 +234,10 @@ ftrace_common:
ldr x28, [fp, 8]
str x28, [x9, #S_LR]/* to pt_regs.r[30] */
 
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   mov x28, lr /* remember old return address */
+#endif
+
ldr_l   x2, function_trace_op, x0
ldr x1, [fp, #8]
sub x0, lr, #8  /* function entry == IP */
@@ -245,6 +250,17 @@ ftrace_call:
 
bl  ftrace_stub
 
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   /* Is the trace function a live patcher an has messed with
+* the return address?
+*/
+   add x9, sp, #16 /* advance to pt_regs for restore */
+   ldr x0, [x9, #S_PC]
+   cmp x0, x28 /* compare with the value we remembered */
+   /* to not call graph tracer's "call" mechan

[PATCH v4 3/3] arm64: reliable stacktraces

2018-10-26 Thread Torsten Duwe
Enhance the stack unwinder so that it reports whether it had to stop
normally or due to an error condition; unwind_frame() will report
continue/error/normal ending and walk_stackframe() will pass that
info. __save_stack_trace() is used to check the validity of a stack;
save_stack_trace_tsk_reliable() can now trivially be implemented.
Modify arch/arm64/kernel/time.c as the only external caller so far
to recognise the new semantics.

I had to introduce a marker symbol kthread_return_to_user to tell
the normal origin of a kernel thread.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -128,8 +128,9 @@ config ARM64
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
-   select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RCU_TABLE_FREE
+   select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_RELIABLE_STACKTRACE
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KPROBES
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -33,7 +33,7 @@ struct stackframe {
 };
 
 extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
-extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+extern int walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
int (*fn)(struct stackframe *, void *), void *data);
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk);
 
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -40,6 +40,16 @@
  * ldp x29, x30, [sp]
  * add sp, sp, #0x10
  */
+
+/* The bottom of kernel thread stacks points there */
+extern void *kthread_return_to_user;
+
+/*
+ * unwind_frame -- unwind a single stack frame.
+ * Returns 0 when there are more frames to go.
+ * 1 means reached end of stack; negative (error)
+ * means stopped because information is not reliable.
+ */
 int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 {
unsigned long fp = frame->fp;
@@ -75,29 +85,39 @@ int notrace unwind_frame(struct task_str
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
/*
+* kthreads created via copy_thread() (called from kthread_create())
+* will have a zero BP and a return value into ret_from_fork.
+*/
+   if (!frame->fp && frame->pc == (unsigned long)_return_to_user)
+   return 1;
+   /*
 * Frames created upon entry from EL0 have NULL FP and PC values, so
 * don't bother reporting these. Frames created by __noreturn functions
 * might have a valid FP even if PC is bogus, so only terminate where
 * both are NULL.
 */
if (!frame->fp && !frame->pc)
-   return -EINVAL;
+   return 1;
 
return 0;
 }
 
-void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+int notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
 int (*fn)(struct stackframe *, void *), void *data)
 {
while (1) {
int ret;
 
-   if (fn(frame, data))
-   break;
+   ret = fn(frame, data);
+   if (ret)
+   return ret;
ret = unwind_frame(tsk, frame);
if (ret < 0)
+   return ret;
+   if (ret > 0)
break;
}
+   return 0;
 }
 
 #ifdef CONFIG_STACKTRACE
@@ -145,14 +165,15 @@ void save_stack_trace_regs(struct pt_reg
trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
 
-static noinline void __save_stack_trace(struct task_struct *tsk,
+static noinline int __save_stack_trace(struct task_struct *tsk,
struct stack_trace *trace, unsigned int nosched)
 {
struct stack_trace_data data;
struct stackframe frame;
+   int ret;
 
if (!try_get_task_stack(tsk))
-   return;
+   return -EBUSY;
 
data.trace = trace;
data.skip = trace->skip;
@@ -171,11 +192,12 @@ static noinline void __save_stack_trace(
frame.graph = tsk->curr_ret_stack;
 #endif
 
-   walk_stackframe(tsk, , save_trace, );
+   ret = walk_stackframe(tsk, , save_trace, );
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
 
put_task_stack(tsk);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
@@ -190,4 +212,12 @@ void save_stack_trace(struct stack_trace
 }
 
 EXPORT_SYMBOL_GPL(save_stack_trace);
+
+int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+   return __save_stack_trace(tsk, trace, 1);
+}
+EXPORT_SYMBOL_GPL(save_stack_trace_tsk_reliable);

[PATCH v4 3/3] arm64: reliable stacktraces

2018-10-26 Thread Torsten Duwe
Enhance the stack unwinder so that it reports whether it had to stop
normally or due to an error condition; unwind_frame() will report
continue/error/normal ending and walk_stackframe() will pass that
info. __save_stack_trace() is used to check the validity of a stack;
save_stack_trace_tsk_reliable() can now trivially be implemented.
Modify arch/arm64/kernel/time.c as the only external caller so far
to recognise the new semantics.

I had to introduce a marker symbol kthread_return_to_user to tell
the normal origin of a kernel thread.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -128,8 +128,9 @@ config ARM64
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
-   select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RCU_TABLE_FREE
+   select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_RELIABLE_STACKTRACE
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KPROBES
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -33,7 +33,7 @@ struct stackframe {
 };
 
 extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
-extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+extern int walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
int (*fn)(struct stackframe *, void *), void *data);
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk);
 
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -40,6 +40,16 @@
  * ldp x29, x30, [sp]
  * add sp, sp, #0x10
  */
+
+/* The bottom of kernel thread stacks points there */
+extern void *kthread_return_to_user;
+
+/*
+ * unwind_frame -- unwind a single stack frame.
+ * Returns 0 when there are more frames to go.
+ * 1 means reached end of stack; negative (error)
+ * means stopped because information is not reliable.
+ */
 int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 {
unsigned long fp = frame->fp;
@@ -75,29 +85,39 @@ int notrace unwind_frame(struct task_str
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
/*
+* kthreads created via copy_thread() (called from kthread_create())
+* will have a zero BP and a return value into ret_from_fork.
+*/
+   if (!frame->fp && frame->pc == (unsigned long)_return_to_user)
+   return 1;
+   /*
 * Frames created upon entry from EL0 have NULL FP and PC values, so
 * don't bother reporting these. Frames created by __noreturn functions
 * might have a valid FP even if PC is bogus, so only terminate where
 * both are NULL.
 */
if (!frame->fp && !frame->pc)
-   return -EINVAL;
+   return 1;
 
return 0;
 }
 
-void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+int notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
 int (*fn)(struct stackframe *, void *), void *data)
 {
while (1) {
int ret;
 
-   if (fn(frame, data))
-   break;
+   ret = fn(frame, data);
+   if (ret)
+   return ret;
ret = unwind_frame(tsk, frame);
if (ret < 0)
+   return ret;
+   if (ret > 0)
break;
}
+   return 0;
 }
 
 #ifdef CONFIG_STACKTRACE
@@ -145,14 +165,15 @@ void save_stack_trace_regs(struct pt_reg
trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
 
-static noinline void __save_stack_trace(struct task_struct *tsk,
+static noinline int __save_stack_trace(struct task_struct *tsk,
struct stack_trace *trace, unsigned int nosched)
 {
struct stack_trace_data data;
struct stackframe frame;
+   int ret;
 
if (!try_get_task_stack(tsk))
-   return;
+   return -EBUSY;
 
data.trace = trace;
data.skip = trace->skip;
@@ -171,11 +192,12 @@ static noinline void __save_stack_trace(
frame.graph = tsk->curr_ret_stack;
 #endif
 
-   walk_stackframe(tsk, , save_trace, );
+   ret = walk_stackframe(tsk, , save_trace, );
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
 
put_task_stack(tsk);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
@@ -190,4 +212,12 @@ void save_stack_trace(struct stack_trace
 }
 
 EXPORT_SYMBOL_GPL(save_stack_trace);
+
+int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+   return __save_stack_trace(tsk, trace, 1);
+}
+EXPORT_SYMBOL_GPL(save_stack_trace_tsk_reliable);

[PATCH v4 0/3] arm64 live patching

2018-10-26 Thread Torsten Duwe
Hi again!

V4 should include all your requested changes. Since only Julien
commented "OK" on the reliable stacktrace part, I finished it on my
own. This set now passes the relevant tests in Libor's test suite, so
livepatching the kernel proper does work.

Remember to apply Jessica's addendum in order to livepatch functions
that live in modules.

[Changes from v3]:

* Compiler support for -fpatchable-function-entry now automagically
  selects _WITH_REGS when DYNAMIC_FTRACE is switched on. Consequently,
  CONFIG_DYNAMIC_FTRACE_WITH_REGS is the only preprocessor symbol
  set by this feature (as asked for by Takahiro in v2)

* The dynamic ftrace caller creates 2 stack frames, as suggested by Ard:
  first a "preliminary" for the callee, and another for ftrace_caller
  itself. This gives the stack layout really a clean look.

* Because the ftrace-clobbered x9 is now saved immediately in the
  "callee" frame, it can be used to base pt_regs access. Much prettier now.

* Dynamic replacement insn "mov x9, lr" is generated using the common
  framework; a hopefully meaningful macro name is used for abbreviation.

* The use_ftrace_trampoline() helper introduced in v3 got renamed
  and streamlined with a reference variable, both as pointed out by Mark.

* Superflous barriers during trace application removed.

* #ifdef replaced by IS_ENABLED() where possible.

* Made stuff compile with gcc7 or older, too ;-)

* Fix my misguided .text.ftrace_regs_trampoline section assumption.
  the second trampoline goes into .text.ftrace_trampoline as well.

* Properly detect the bottom of kthread stacks, by setting a global
  symbol to the address where their LR points to and compare against it.

* Rewrote many comments to hopefully clear things up.

[Changes from v2]:

* ifeq($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y) instead of ifdef

* "fix" commit 06aeaaeabf69da4. (new patch 1)
  Made DYNAMIC_FTRACE_WITH_REGS a real choice. The current situation
  would be that a linux-4.20 kernel on arm64 should be built with
  gcc >= 8; as in this case, as well as all other archs, the "default y"
  works. Only kernels >= 4.20, arm64, gcc < 8, must change this to "n"
  in order to not be stopped by the Makefile $(error) from patch 2/4.
  You'll then fall back to the DYNAMIC_FTRACE, if selected, like before.

* use some S_X* constants to refer to offsets into pt_regs in assembly.

* have the compiler/assembler generate the mov x9,x30 instruction that
  saves LR at compile time, rather than generate it repeatedly at runtime.

* flip the ftrace_regs_caller stack frame so that it is no longer
  upside down, as Ard remarked. This change broke the graph caller somehow.

* extend handling of the module arch-dependent ftrace trampoline with
  a companion "regs" version.
  
* clear the _TIF_PATCH_PENDING on do_notify_resume()

* took care of arch/arm64/kernel/time.c when changing stack unwinder
  semantics

[Changes from v1]:

* Missing compiler support is now a Makefile error, instead
  of a warning. This will keep the compile log shorter and
  it will thus be easier to spot the problem.

* A separate ftrace_regs_caller. Only that one will write out
  a complete pt_regs, for efficiency.

* Replace the use of X19 with X28 to remember the old PC during
  live patch detection, as only that is saved now for
  non-regs ftrace.

* CONFIG_DYNAMIC_FTRACE_WITH_REGS and CONFIG_DYNAMIC_FTRACE_WITH_REGS
  are currently synonymous on arm64, but differentiate better for
  the future when this is no longer the case.

* Clean up "old"/"new" insn value setting vs. #ifdefs.

* #define a INSN_MOV_X9_X30 with suggested aarch64_insn_gen call
  and use that instead of an immediate hex value.

Torsten


[PATCH v4 0/3] arm64 live patching

2018-10-26 Thread Torsten Duwe
Hi again!

V4 should include all your requested changes. Since only Julien
commented "OK" on the reliable stacktrace part, I finished it on my
own. This set now passes the relevant tests in Libor's test suite, so
livepatching the kernel proper does work.

Remember to apply Jessica's addendum in order to livepatch functions
that live in modules.

[Changes from v3]:

* Compiler support for -fpatchable-function-entry now automagically
  selects _WITH_REGS when DYNAMIC_FTRACE is switched on. Consequently,
  CONFIG_DYNAMIC_FTRACE_WITH_REGS is the only preprocessor symbol
  set by this feature (as asked for by Takahiro in v2)

* The dynamic ftrace caller creates 2 stack frames, as suggested by Ard:
  first a "preliminary" for the callee, and another for ftrace_caller
  itself. This gives the stack layout really a clean look.

* Because the ftrace-clobbered x9 is now saved immediately in the
  "callee" frame, it can be used to base pt_regs access. Much prettier now.

* Dynamic replacement insn "mov x9, lr" is generated using the common
  framework; a hopefully meaningful macro name is used for abbreviation.

* The use_ftrace_trampoline() helper introduced in v3 got renamed
  and streamlined with a reference variable, both as pointed out by Mark.

* Superflous barriers during trace application removed.

* #ifdef replaced by IS_ENABLED() where possible.

* Made stuff compile with gcc7 or older, too ;-)

* Fix my misguided .text.ftrace_regs_trampoline section assumption.
  the second trampoline goes into .text.ftrace_trampoline as well.

* Properly detect the bottom of kthread stacks, by setting a global
  symbol to the address where their LR points to and compare against it.

* Rewrote many comments to hopefully clear things up.

[Changes from v2]:

* ifeq($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y) instead of ifdef

* "fix" commit 06aeaaeabf69da4. (new patch 1)
  Made DYNAMIC_FTRACE_WITH_REGS a real choice. The current situation
  would be that a linux-4.20 kernel on arm64 should be built with
  gcc >= 8; as in this case, as well as all other archs, the "default y"
  works. Only kernels >= 4.20, arm64, gcc < 8, must change this to "n"
  in order to not be stopped by the Makefile $(error) from patch 2/4.
  You'll then fall back to the DYNAMIC_FTRACE, if selected, like before.

* use some S_X* constants to refer to offsets into pt_regs in assembly.

* have the compiler/assembler generate the mov x9,x30 instruction that
  saves LR at compile time, rather than generate it repeatedly at runtime.

* flip the ftrace_regs_caller stack frame so that it is no longer
  upside down, as Ard remarked. This change broke the graph caller somehow.

* extend handling of the module arch-dependent ftrace trampoline with
  a companion "regs" version.
  
* clear the _TIF_PATCH_PENDING on do_notify_resume()

* took care of arch/arm64/kernel/time.c when changing stack unwinder
  semantics

[Changes from v1]:

* Missing compiler support is now a Makefile error, instead
  of a warning. This will keep the compile log shorter and
  it will thus be easier to spot the problem.

* A separate ftrace_regs_caller. Only that one will write out
  a complete pt_regs, for efficiency.

* Replace the use of X19 with X28 to remember the old PC during
  live patch detection, as only that is saved now for
  non-regs ftrace.

* CONFIG_DYNAMIC_FTRACE_WITH_REGS and CONFIG_DYNAMIC_FTRACE_WITH_REGS
  are currently synonymous on arm64, but differentiate better for
  the future when this is no longer the case.

* Clean up "old"/"new" insn value setting vs. #ifdefs.

* #define a INSN_MOV_X9_X30 with suggested aarch64_insn_gen call
  and use that instead of an immediate hex value.

Torsten


Re: [PATCH v3 3/4] arm64: implement live patching

2018-10-22 Thread Torsten Duwe
On Mon, Oct 22, 2018 at 02:53:10PM +0200, Miroslav Benes wrote:
> On Sat, 20 Oct 2018, Ard Biesheuvel wrote:
> > So I suppose this could get interesting in cases where modules are far
> > away from the kernel (i.e., more than -/+ 128 MB). Fortunately, the
> > modules themselves are always placed in a 128 MB window, but this
> > window could be out of reach for branches into the kernel proper. If
> > we find ourselves in the situation where we need to patch calls into
> > the kernel proper to point into this module, this may get interesting,
> > since the PLT entries *must* be allocated along with the module that
> > contains the branch instruction, or the PLT entry itself may be out of
> > range, defeating the purpose.
> 
> Hm... Torsten, didn't you have to solve something similar in powerpc case? 

At address ranges: that was the TOC pointer issue, that is an array of
addresses and other 64-bit constants with a reference to it kept in a register,
whose value is local to any module. PPC64 needs 5 instructions to load a 64-bit
value immediate, arm has the ldr_l macro. So in short: no. We were discussing a
few nasty tricks to reconstruct the TOC value from the code addresses, but
ended up with a solution that is clean in that respect.

At PLTs: ppc64 uses, IIRC, a regular trampoline slot to get to all called
functions, including ftrace and friends. So for e.g. live patch kernels
the total number, as predetermined, is only incremented by 2. Only the
trampoline _code_ for ftrace was modified.
I assume with "branches" you mean "branch with link" a.k.a a subroutine call?
All references to external symbols are allocated a PLT entry, right?

ftrace / live patching calls are alway intercepted at the called function.
   =====-
The interceptor then uses a special trampoline to go to ftrace_caller, which
does the rest.

Am I missing something?

Torsten



Re: [PATCH v3 3/4] arm64: implement live patching

2018-10-22 Thread Torsten Duwe
On Mon, Oct 22, 2018 at 02:53:10PM +0200, Miroslav Benes wrote:
> On Sat, 20 Oct 2018, Ard Biesheuvel wrote:
> > So I suppose this could get interesting in cases where modules are far
> > away from the kernel (i.e., more than -/+ 128 MB). Fortunately, the
> > modules themselves are always placed in a 128 MB window, but this
> > window could be out of reach for branches into the kernel proper. If
> > we find ourselves in the situation where we need to patch calls into
> > the kernel proper to point into this module, this may get interesting,
> > since the PLT entries *must* be allocated along with the module that
> > contains the branch instruction, or the PLT entry itself may be out of
> > range, defeating the purpose.
> 
> Hm... Torsten, didn't you have to solve something similar in powerpc case? 

At address ranges: that was the TOC pointer issue, that is an array of
addresses and other 64-bit constants with a reference to it kept in a register,
whose value is local to any module. PPC64 needs 5 instructions to load a 64-bit
value immediate, arm has the ldr_l macro. So in short: no. We were discussing a
few nasty tricks to reconstruct the TOC value from the code addresses, but
ended up with a solution that is clean in that respect.

At PLTs: ppc64 uses, IIRC, a regular trampoline slot to get to all called
functions, including ftrace and friends. So for e.g. live patch kernels
the total number, as predetermined, is only incremented by 2. Only the
trampoline _code_ for ftrace was modified.
I assume with "branches" you mean "branch with link" a.k.a a subroutine call?
All references to external symbols are allocated a PLT entry, right?

ftrace / live patching calls are alway intercepted at the called function.
   =====-
The interceptor then uses a special trampoline to go to ftrace_caller, which
does the rest.

Am I missing something?

Torsten



Re: [PATCH v3 3/4] arm64: implement live patching

2018-10-19 Thread Torsten Duwe
On Fri, Oct 19, 2018 at 01:59:01PM +0200, Miroslav Benes wrote:
> 
> Torsten, could you include the outcome to your patch set once we settle on 
> it? Thanks.

Absolutely! Whether as patch 4/4 or on its own and I refer to it -- we'll
figure it out.

Torsten



Re: [PATCH v3 3/4] arm64: implement live patching

2018-10-19 Thread Torsten Duwe
On Fri, Oct 19, 2018 at 01:59:01PM +0200, Miroslav Benes wrote:
> 
> Torsten, could you include the outcome to your patch set once we settle on 
> it? Thanks.

Absolutely! Whether as patch 4/4 or on its own and I refer to it -- we'll
figure it out.

Torsten



Re: [PATCH v3 2/4] arm64: implement ftrace with regs

2018-10-02 Thread Torsten Duwe
Hi Mark,

thank you for your very detailed feedback, I'll incorporate it
all into the next version, besides one issue:

On Tue, Oct 02, 2018 at 12:27:41PM +0100, Mark Rutland wrote:
> 
> Please use the insn framework, as we do to generate all the other
> instruction sequences in ftrace.
> 
> MOV (register) is an alias of ORR (shifted register), i.e.
> 
>   mov , 
> 
> ... is:
> 
>   orr , xzr, 
> 
> ... and we have code to generate ORR, so we can add a trivial wrapper to
> generate MOV.

I had something similar in v2; but it was hardly any better to read or
understand. My main question however is: how do you justify the runtime
overhead of aarch64_insn_gen_logical_shifted_reg for every function that
gets its tracing switched on or off? The result is always the same 4-byte
constant, so why not use a macro and a comment that says what it does?

Torsten



Re: [PATCH v3 2/4] arm64: implement ftrace with regs

2018-10-02 Thread Torsten Duwe
Hi Mark,

thank you for your very detailed feedback, I'll incorporate it
all into the next version, besides one issue:

On Tue, Oct 02, 2018 at 12:27:41PM +0100, Mark Rutland wrote:
> 
> Please use the insn framework, as we do to generate all the other
> instruction sequences in ftrace.
> 
> MOV (register) is an alias of ORR (shifted register), i.e.
> 
>   mov , 
> 
> ... is:
> 
>   orr , xzr, 
> 
> ... and we have code to generate ORR, so we can add a trivial wrapper to
> generate MOV.

I had something similar in v2; but it was hardly any better to read or
understand. My main question however is: how do you justify the runtime
overhead of aarch64_insn_gen_logical_shifted_reg for every function that
gets its tracing switched on or off? The result is always the same 4-byte
constant, so why not use a macro and a comment that says what it does?

Torsten



Re: [PATCH v3 2/4] arm64: implement ftrace with regs

2018-10-02 Thread Torsten Duwe
On Mon, Oct 01, 2018 at 05:57:52PM +0200, Ard Biesheuvel wrote:
> > --- a/arch/arm64/include/asm/ftrace.h
> > +++ b/arch/arm64/include/asm/ftrace.h
> > @@ -16,6 +16,17 @@
> >  #define MCOUNT_ADDR((unsigned long)_mcount)
> >  #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
> >
> > +/* DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the 
> > beginning
> > +   of each function, with the second NOP actually calling ftrace. In 
> > contrary
> > +   to a classic _mcount call, the call instruction to be modified is thus
> > +   the second one, and not the only one. */
> 
> OK, so the first slot will be patched unconditionally to do the 'mov x9, x30' 
> ?

Right.

> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +#define ARCH_SUPPORTS_FTRACE_OPS 1
> > +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> > +#else
> > +#define REC_IP_BRANCH_OFFSET 0
> > +#endif

The main reason for above comment was that a previous reviewer wondered
about a magic value of "4" for the REC_IP_BRANCH_OFFSET, which is actually
an insn size. The comment should leave no doubt. I'd leave the LR save
explanation elsewhere.

> > mcount_exit
> >  ENDPROC(ftrace_caller)
> > +#else /* CC_USING_PATCHABLE_FUNCTION_ENTRY */
> > +
> > +/* Since no -pg or similar compiler flag is used, there should really be
> > +   no reference to _mcount; so do not define one. Only a function address
> > +   is needed in order to refer to it. */
> > +ENTRY(_mcount)
> > +   ret /* just in case, prevent any fall through. */
> > +ENDPROC(_mcount)
> > +
> > +ENTRY(ftrace_regs_caller)
> > +   sub sp, sp, #S_FRAME_SIZE
> > +   stp x29, x9, [sp, #-16] /* FP/LR link */
> > +
> 
> You cannot write below the stack pointer. So you are missing a
> trailing ! here. Note that you can fold the sub
> 
> stp x29, x9, [sp, #-(S_FRAME_SIZE+16)]!

Very well, but...

> > +   stp x10, x11, [sp, #S_X10]
> > +   stp x12, x13, [sp, #S_X12]
> > +   stp x14, x15, [sp, #112]
> > +   stp x16, x17, [sp, #128]
> > +   stp x18, x19, [sp, #144]
> > +   stp x20, x21, [sp, #160]
> > +   stp x22, x23, [sp, #176]
> > +   stp x24, x25, [sp, #192]
> > +   stp x26, x27, [sp, #208]
> > +
> 
> All these will shift by 16 bytes though
> 
> I am now wondering if it wouldn't be better to create 2 stack frames:
> one for the interrupted function, and one for this function.
> 
> So something like
> 
> stp x29, x9, [sp, #-16]!
> mov x29, sp

That's about the way it was before, when you criticised it was
the wrong way ;-)

> stp x29, x30, [sp, #-(S_FRAME_SIZE + 16]!
> 
> ... store all registers including x29 ...
> 
> and do another mov x29, sp before calling into the handler. That way
> everything should be visible on the call stack when we do a backtrace.

I'm not 100% sure, but I think it already is visible correctly. Note
that the callee has in no way been called yet; control flow is
immediately diverted to the ftrace_caller.

About using SP as a pt_regs pointer: maybe I can free another register
for that purpose and thus achieve conformance *and* pretty code.

> 
> > +   b   ftrace_common
> > +ENDPROC(ftrace_regs_caller)
> > +
> > +ENTRY(ftrace_caller)
> > +   sub sp, sp, #S_FRAME_SIZE
> > +   stp x29, x9, [sp, #-16] /* FP/LR link */
> > +
> 
> Same as above

Yes, Steven demanded 2 entry points :)

> >  /*
> > --- a/arch/arm64/kernel/ftrace.c
> > +++ b/arch/arm64/kernel/ftrace.c
> > @@ -65,18 +65,66 @@ int ftrace_update_ftrace_func(ftrace_fun
> > return ftrace_modify_code(pc, 0, new, false);
> >  }
> >
> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +/* Have the assembler generate a known "mov x9,x30" at compile time. */
> > +static void notrace noinline __attribute__((used)) mov_x9_x30(void)
> > +{
> > +   asm(" .global insn_mov_x9_x30\n"
> > +"insn_mov_x9_x30: mov x9,x30\n" : : : "x9");
> > +}
> 
> You cannot rely on the compiler putting the mov at the beginning. I

As you can see from the asm inline, I tried the more precise assembler
label, but it didn't work out. With enough optimisation, the mov _is_
first; but you're right, it's not a good idea to rely on that.

> think some well commented #define should do for the opcode (or did you
> just remove that?)

Alas, yes I did. I had a define, then run-time generation, and now this
assembler hack. Looking at the 3, the define would be best, I'd say.

Torsten



Re: [PATCH v3 2/4] arm64: implement ftrace with regs

2018-10-02 Thread Torsten Duwe
On Mon, Oct 01, 2018 at 05:57:52PM +0200, Ard Biesheuvel wrote:
> > --- a/arch/arm64/include/asm/ftrace.h
> > +++ b/arch/arm64/include/asm/ftrace.h
> > @@ -16,6 +16,17 @@
> >  #define MCOUNT_ADDR((unsigned long)_mcount)
> >  #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
> >
> > +/* DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the 
> > beginning
> > +   of each function, with the second NOP actually calling ftrace. In 
> > contrary
> > +   to a classic _mcount call, the call instruction to be modified is thus
> > +   the second one, and not the only one. */
> 
> OK, so the first slot will be patched unconditionally to do the 'mov x9, x30' 
> ?

Right.

> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +#define ARCH_SUPPORTS_FTRACE_OPS 1
> > +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> > +#else
> > +#define REC_IP_BRANCH_OFFSET 0
> > +#endif

The main reason for above comment was that a previous reviewer wondered
about a magic value of "4" for the REC_IP_BRANCH_OFFSET, which is actually
an insn size. The comment should leave no doubt. I'd leave the LR save
explanation elsewhere.

> > mcount_exit
> >  ENDPROC(ftrace_caller)
> > +#else /* CC_USING_PATCHABLE_FUNCTION_ENTRY */
> > +
> > +/* Since no -pg or similar compiler flag is used, there should really be
> > +   no reference to _mcount; so do not define one. Only a function address
> > +   is needed in order to refer to it. */
> > +ENTRY(_mcount)
> > +   ret /* just in case, prevent any fall through. */
> > +ENDPROC(_mcount)
> > +
> > +ENTRY(ftrace_regs_caller)
> > +   sub sp, sp, #S_FRAME_SIZE
> > +   stp x29, x9, [sp, #-16] /* FP/LR link */
> > +
> 
> You cannot write below the stack pointer. So you are missing a
> trailing ! here. Note that you can fold the sub
> 
> stp x29, x9, [sp, #-(S_FRAME_SIZE+16)]!

Very well, but...

> > +   stp x10, x11, [sp, #S_X10]
> > +   stp x12, x13, [sp, #S_X12]
> > +   stp x14, x15, [sp, #112]
> > +   stp x16, x17, [sp, #128]
> > +   stp x18, x19, [sp, #144]
> > +   stp x20, x21, [sp, #160]
> > +   stp x22, x23, [sp, #176]
> > +   stp x24, x25, [sp, #192]
> > +   stp x26, x27, [sp, #208]
> > +
> 
> All these will shift by 16 bytes though
> 
> I am now wondering if it wouldn't be better to create 2 stack frames:
> one for the interrupted function, and one for this function.
> 
> So something like
> 
> stp x29, x9, [sp, #-16]!
> mov x29, sp

That's about the way it was before, when you criticised it was
the wrong way ;-)

> stp x29, x30, [sp, #-(S_FRAME_SIZE + 16]!
> 
> ... store all registers including x29 ...
> 
> and do another mov x29, sp before calling into the handler. That way
> everything should be visible on the call stack when we do a backtrace.

I'm not 100% sure, but I think it already is visible correctly. Note
that the callee has in no way been called yet; control flow is
immediately diverted to the ftrace_caller.

About using SP as a pt_regs pointer: maybe I can free another register
for that purpose and thus achieve conformance *and* pretty code.

> 
> > +   b   ftrace_common
> > +ENDPROC(ftrace_regs_caller)
> > +
> > +ENTRY(ftrace_caller)
> > +   sub sp, sp, #S_FRAME_SIZE
> > +   stp x29, x9, [sp, #-16] /* FP/LR link */
> > +
> 
> Same as above

Yes, Steven demanded 2 entry points :)

> >  /*
> > --- a/arch/arm64/kernel/ftrace.c
> > +++ b/arch/arm64/kernel/ftrace.c
> > @@ -65,18 +65,66 @@ int ftrace_update_ftrace_func(ftrace_fun
> > return ftrace_modify_code(pc, 0, new, false);
> >  }
> >
> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> > +/* Have the assembler generate a known "mov x9,x30" at compile time. */
> > +static void notrace noinline __attribute__((used)) mov_x9_x30(void)
> > +{
> > +   asm(" .global insn_mov_x9_x30\n"
> > +"insn_mov_x9_x30: mov x9,x30\n" : : : "x9");
> > +}
> 
> You cannot rely on the compiler putting the mov at the beginning. I

As you can see from the asm inline, I tried the more precise assembler
label, but it didn't work out. With enough optimisation, the mov _is_
first; but you're right, it's not a good idea to rely on that.

> think some well commented #define should do for the opcode (or did you
> just remove that?)

Alas, yes I did. I had a define, then run-time generation, and now this
assembler hack. Looking at the 3, the define would be best, I'd say.

Torsten



Re: [PATCH v3 1/4] DYNAMIC_FTRACE configurable with and without REGS

2018-10-01 Thread Torsten Duwe
On Mon, Oct 01, 2018 at 05:06:06PM +0200, Ard Biesheuvel wrote:
> On 1 October 2018 at 17:03, Torsten Duwe  wrote:
> > On Mon, Oct 01, 2018 at 04:52:27PM +0200, Ard Biesheuvel wrote:
> >>
> >> I guess we now have Kbuild/Kconfig support for this, no? I mean, we
> >> can now show/hide options depending on the capabilities of the
> >> toolchain.
> >
> > Config options depending on flags availability?
> >
> Yes. Note that 'make menuconfig' now prints the compiler version at
> the top, and kconfig options can now 'depend' on compiler features,

Ah, cool, got it. So unless anyone else thinks this patch is useful,
feel free to disregard it ;-) Point taken.

Torsten



Re: [PATCH v3 1/4] DYNAMIC_FTRACE configurable with and without REGS

2018-10-01 Thread Torsten Duwe
On Mon, Oct 01, 2018 at 05:06:06PM +0200, Ard Biesheuvel wrote:
> On 1 October 2018 at 17:03, Torsten Duwe  wrote:
> > On Mon, Oct 01, 2018 at 04:52:27PM +0200, Ard Biesheuvel wrote:
> >>
> >> I guess we now have Kbuild/Kconfig support for this, no? I mean, we
> >> can now show/hide options depending on the capabilities of the
> >> toolchain.
> >
> > Config options depending on flags availability?
> >
> Yes. Note that 'make menuconfig' now prints the compiler version at
> the top, and kconfig options can now 'depend' on compiler features,

Ah, cool, got it. So unless anyone else thinks this patch is useful,
feel free to disregard it ;-) Point taken.

Torsten



Re: [PATCH v3 1/4] DYNAMIC_FTRACE configurable with and without REGS

2018-10-01 Thread Torsten Duwe
On Mon, Oct 01, 2018 at 04:52:27PM +0200, Ard Biesheuvel wrote:
> 
> I guess we now have Kbuild/Kconfig support for this, no? I mean, we
> can now show/hide options depending on the capabilities of the
> toolchain.

Config options depending on flags availability?

> I am not saying it would be a better approach, though - I'd rather
> have a warning than have things silently disabled, but other people
> may think differently.

Even then this switch has to be enabled, no matter who or what sets it.
Note that this patch leaves everything as-is. Only arm64 users with
"old" compilers would need to take action.

Torsten



Re: [PATCH v3 1/4] DYNAMIC_FTRACE configurable with and without REGS

2018-10-01 Thread Torsten Duwe
On Mon, Oct 01, 2018 at 04:52:27PM +0200, Ard Biesheuvel wrote:
> 
> I guess we now have Kbuild/Kconfig support for this, no? I mean, we
> can now show/hide options depending on the capabilities of the
> toolchain.

Config options depending on flags availability?

> I am not saying it would be a better approach, though - I'd rather
> have a warning than have things silently disabled, but other people
> may think differently.

Even then this switch has to be enabled, no matter who or what sets it.
Note that this patch leaves everything as-is. Only arm64 users with
"old" compilers would need to take action.

Torsten



[PATCH v3 4/4] arm64: reliable stacktraces

2018-10-01 Thread Torsten Duwe
Make unwind_frame() report whether it had to stop normally or due to
an error condition; walk_stackframe() will pass that info.
__save_stack_trace() is used to check the validity of a frame;
save_stack_trace_tsk_reliable() can now trivially be implemented.
Modify arch/arm64/kernel/time.c for the new semantics.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -127,8 +127,9 @@ config ARM64
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
-   select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RCU_TABLE_FREE
+   select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_RELIABLE_STACKTRACE
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KPROBES
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -33,7 +33,7 @@ struct stackframe {
 };
 
 extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
-extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+extern int walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
int (*fn)(struct stackframe *, void *), void *data);
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk);
 
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -40,6 +40,13 @@
  * ldp x29, x30, [sp]
  * add sp, sp, #0x10
  */
+
+/*
+ * unwind_frame -- unwind a single stack frame.
+ * Returns 0 when there are more frames to go.
+ * 1 means reached end of stack; negative (error)
+ * means stopped because information is not reliable.
+ */
 int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 {
unsigned long fp = frame->fp;
@@ -81,23 +88,27 @@ int notrace unwind_frame(struct task_str
 * both are NULL.
 */
if (!frame->fp && !frame->pc)
-   return -EINVAL;
+   return 1;
 
return 0;
 }
 
-void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+int notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
 int (*fn)(struct stackframe *, void *), void *data)
 {
while (1) {
int ret;
 
-   if (fn(frame, data))
-   break;
+   ret = fn(frame, data);
+   if (ret)
+   return ret;
ret = unwind_frame(tsk, frame);
if (ret < 0)
+   return ret;
+   if (ret > 0)
break;
}
+   return 0;
 }
 
 #ifdef CONFIG_STACKTRACE
@@ -145,14 +156,15 @@ void save_stack_trace_regs(struct pt_reg
trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
 
-static noinline void __save_stack_trace(struct task_struct *tsk,
+static noinline int __save_stack_trace(struct task_struct *tsk,
struct stack_trace *trace, unsigned int nosched)
 {
struct stack_trace_data data;
struct stackframe frame;
+   int ret;
 
if (!try_get_task_stack(tsk))
-   return;
+   return -EBUSY;
 
data.trace = trace;
data.skip = trace->skip;
@@ -171,11 +183,12 @@ static noinline void __save_stack_trace(
frame.graph = tsk->curr_ret_stack;
 #endif
 
-   walk_stackframe(tsk, , save_trace, );
+   ret = walk_stackframe(tsk, , save_trace, );
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
 
put_task_stack(tsk);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
@@ -190,4 +203,12 @@ void save_stack_trace(struct stack_trace
 }
 
 EXPORT_SYMBOL_GPL(save_stack_trace);
+
+int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+   return __save_stack_trace(tsk, trace, 1);
+}
+EXPORT_SYMBOL_GPL(save_stack_trace_tsk_reliable);
+
 #endif
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -56,7 +56,7 @@ unsigned long profile_pc(struct pt_regs
 #endif
do {
int ret = unwind_frame(NULL, );
-   if (ret < 0)
+   if (ret)
return 0;
} while (in_lock_functions(frame.pc));
 


[PATCH v3 4/4] arm64: reliable stacktraces

2018-10-01 Thread Torsten Duwe
Make unwind_frame() report whether it had to stop normally or due to
an error condition; walk_stackframe() will pass that info.
__save_stack_trace() is used to check the validity of a frame;
save_stack_trace_tsk_reliable() can now trivially be implemented.
Modify arch/arm64/kernel/time.c for the new semantics.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -127,8 +127,9 @@ config ARM64
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
-   select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RCU_TABLE_FREE
+   select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_RELIABLE_STACKTRACE
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KPROBES
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -33,7 +33,7 @@ struct stackframe {
 };
 
 extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
-extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+extern int walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
int (*fn)(struct stackframe *, void *), void *data);
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk);
 
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -40,6 +40,13 @@
  * ldp x29, x30, [sp]
  * add sp, sp, #0x10
  */
+
+/*
+ * unwind_frame -- unwind a single stack frame.
+ * Returns 0 when there are more frames to go.
+ * 1 means reached end of stack; negative (error)
+ * means stopped because information is not reliable.
+ */
 int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 {
unsigned long fp = frame->fp;
@@ -81,23 +88,27 @@ int notrace unwind_frame(struct task_str
 * both are NULL.
 */
if (!frame->fp && !frame->pc)
-   return -EINVAL;
+   return 1;
 
return 0;
 }
 
-void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+int notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
 int (*fn)(struct stackframe *, void *), void *data)
 {
while (1) {
int ret;
 
-   if (fn(frame, data))
-   break;
+   ret = fn(frame, data);
+   if (ret)
+   return ret;
ret = unwind_frame(tsk, frame);
if (ret < 0)
+   return ret;
+   if (ret > 0)
break;
}
+   return 0;
 }
 
 #ifdef CONFIG_STACKTRACE
@@ -145,14 +156,15 @@ void save_stack_trace_regs(struct pt_reg
trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
 
-static noinline void __save_stack_trace(struct task_struct *tsk,
+static noinline int __save_stack_trace(struct task_struct *tsk,
struct stack_trace *trace, unsigned int nosched)
 {
struct stack_trace_data data;
struct stackframe frame;
+   int ret;
 
if (!try_get_task_stack(tsk))
-   return;
+   return -EBUSY;
 
data.trace = trace;
data.skip = trace->skip;
@@ -171,11 +183,12 @@ static noinline void __save_stack_trace(
frame.graph = tsk->curr_ret_stack;
 #endif
 
-   walk_stackframe(tsk, , save_trace, );
+   ret = walk_stackframe(tsk, , save_trace, );
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
 
put_task_stack(tsk);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
@@ -190,4 +203,12 @@ void save_stack_trace(struct stack_trace
 }
 
 EXPORT_SYMBOL_GPL(save_stack_trace);
+
+int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+   return __save_stack_trace(tsk, trace, 1);
+}
+EXPORT_SYMBOL_GPL(save_stack_trace_tsk_reliable);
+
 #endif
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -56,7 +56,7 @@ unsigned long profile_pc(struct pt_regs
 #endif
do {
int ret = unwind_frame(NULL, );
-   if (ret < 0)
+   if (ret)
return 0;
} while (in_lock_functions(frame.pc));
 


[PATCH v3 3/4] arm64: implement live patching

2018-10-01 Thread Torsten Duwe
Based on ftrace with regs, do the usual thing. Also allocate a
task flag for whatever consistency handling will be used.
Watch out for interactions with the graph tracer.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -119,6 +119,7 @@ config ARM64
select HAVE_GENERIC_DMA_COHERENT
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING
+   select HAVE_LIVEPATCH
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP if NUMA
select HAVE_NMI
@@ -1349,4 +1350,6 @@ if CRYPTO
 source "arch/arm64/crypto/Kconfig"
 endif
 
+source "kernel/livepatch/Kconfig"
+
 source "lib/Kconfig"
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -76,6 +76,7 @@ void arch_release_task_struct(struct tas
 #define TIF_FOREIGN_FPSTATE3   /* CPU's FP state is not current's */
 #define TIF_UPROBE 4   /* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK5   /* Check FS is USER_DS on return */
+#define TIF_PATCH_PENDING  6
 #define TIF_NOHZ   7
 #define TIF_SYSCALL_TRACE  8
 #define TIF_SYSCALL_AUDIT  9
@@ -94,6 +95,7 @@ void arch_release_task_struct(struct tas
 #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
 #define _TIF_FOREIGN_FPSTATE   (1 << TIF_FOREIGN_FPSTATE)
+#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING)
 #define _TIF_NOHZ  (1 << TIF_NOHZ)
 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
 #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
@@ -106,7 +108,8 @@ void arch_release_task_struct(struct tas
 
 #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-_TIF_UPROBE | _TIF_FSCHECK)
+_TIF_UPROBE | _TIF_FSCHECK | \
+_TIF_PATCH_PENDING)
 
 #define _TIF_SYSCALL_WORK  (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
--- /dev/null
+++ b/arch/arm64/include/asm/livepatch.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * livepatch.h - arm64-specific Kernel Live Patching Core
+ *
+ * Copyright (C) 2016,2018 SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef _ASM_ARM64_LIVEPATCH_H
+#define _ASM_ARM64_LIVEPATCH_H
+
+#include 
+#include 
+
+static inline int klp_check_compiler_support(void)
+{
+   return 0;
+}
+
+static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
+{
+   regs->pc = ip;
+}
+
+#endif /* _ASM_ARM64_LIVEPATCH_H */
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -209,6 +209,9 @@ ftrace_common:
str x9, [sp, #S_LR] /* to pt_regs.r[30] */
/* The program counter just after the ftrace call site */
str lr, [sp, #S_PC]
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   mov x28,lr  /* remember old return address */
+#endif
/* The stack pointer as it was on ftrace_caller entry... */
add x29, sp, #S_FRAME_SIZE
str x29, [sp, #S_SP]
@@ -224,6 +227,16 @@ ftrace_call:
 
bl  ftrace_stub
 
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   /* Is the trace function a live patcher an has messed with
+* the return address?
+   */
+   ldr x9, [sp, #S_PC+16]
+   cmp x9, x28 /* compare with the value we remembered */
+   /* to not call graph tracer's "call" mechanism twice! */
+   b.neftrace_common_return
+#endif
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
.global ftrace_graph_call
 ftrace_graph_call: // ftrace_graph_caller();
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -934,6 +935,9 @@ asmlinkage void do_notify_resume(struct
if (thread_flags & _TIF_UPROBE)
uprobe_notify_resume(regs);
 
+ 

[PATCH v3 3/4] arm64: implement live patching

2018-10-01 Thread Torsten Duwe
Based on ftrace with regs, do the usual thing. Also allocate a
task flag for whatever consistency handling will be used.
Watch out for interactions with the graph tracer.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -119,6 +119,7 @@ config ARM64
select HAVE_GENERIC_DMA_COHERENT
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING
+   select HAVE_LIVEPATCH
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP if NUMA
select HAVE_NMI
@@ -1349,4 +1350,6 @@ if CRYPTO
 source "arch/arm64/crypto/Kconfig"
 endif
 
+source "kernel/livepatch/Kconfig"
+
 source "lib/Kconfig"
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -76,6 +76,7 @@ void arch_release_task_struct(struct tas
 #define TIF_FOREIGN_FPSTATE3   /* CPU's FP state is not current's */
 #define TIF_UPROBE 4   /* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK5   /* Check FS is USER_DS on return */
+#define TIF_PATCH_PENDING  6
 #define TIF_NOHZ   7
 #define TIF_SYSCALL_TRACE  8
 #define TIF_SYSCALL_AUDIT  9
@@ -94,6 +95,7 @@ void arch_release_task_struct(struct tas
 #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
 #define _TIF_FOREIGN_FPSTATE   (1 << TIF_FOREIGN_FPSTATE)
+#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING)
 #define _TIF_NOHZ  (1 << TIF_NOHZ)
 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
 #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
@@ -106,7 +108,8 @@ void arch_release_task_struct(struct tas
 
 #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-_TIF_UPROBE | _TIF_FSCHECK)
+_TIF_UPROBE | _TIF_FSCHECK | \
+_TIF_PATCH_PENDING)
 
 #define _TIF_SYSCALL_WORK  (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
--- /dev/null
+++ b/arch/arm64/include/asm/livepatch.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * livepatch.h - arm64-specific Kernel Live Patching Core
+ *
+ * Copyright (C) 2016,2018 SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef _ASM_ARM64_LIVEPATCH_H
+#define _ASM_ARM64_LIVEPATCH_H
+
+#include 
+#include 
+
+static inline int klp_check_compiler_support(void)
+{
+   return 0;
+}
+
+static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
+{
+   regs->pc = ip;
+}
+
+#endif /* _ASM_ARM64_LIVEPATCH_H */
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -209,6 +209,9 @@ ftrace_common:
str x9, [sp, #S_LR] /* to pt_regs.r[30] */
/* The program counter just after the ftrace call site */
str lr, [sp, #S_PC]
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   mov x28,lr  /* remember old return address */
+#endif
/* The stack pointer as it was on ftrace_caller entry... */
add x29, sp, #S_FRAME_SIZE
str x29, [sp, #S_SP]
@@ -224,6 +227,16 @@ ftrace_call:
 
bl  ftrace_stub
 
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   /* Is the trace function a live patcher an has messed with
+* the return address?
+   */
+   ldr x9, [sp, #S_PC+16]
+   cmp x9, x28 /* compare with the value we remembered */
+   /* to not call graph tracer's "call" mechanism twice! */
+   b.neftrace_common_return
+#endif
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
.global ftrace_graph_call
 ftrace_graph_call: // ftrace_graph_caller();
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -934,6 +935,9 @@ asmlinkage void do_notify_resume(struct
if (thread_flags & _TIF_UPROBE)
uprobe_notify_resume(regs);
 
+ 

[PATCH v3 2/4] arm64: implement ftrace with regs

2018-10-01 Thread Torsten Duwe
Check for compiler support of -fpatchable-function-entry and use it
to intercept functions immediately on entry, saving the LR in x9.
patchable-function-entry in GCC disables IPA-RA, which means ABI
register calling conventions are obeyed *and* scratch registers are
available.
Disable ftracing in efi/libstub, because this triggers cross-section
linker errors now (-pg is disabled already for those files).
Add an ftrace_caller which can handle LR in x9, as well as an
ftrace_regs_caller that additionally writes out a set of pt_regs
for inspection.
Introduce and handle an ftrace_regs_trampoline for module PLTs.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -110,6 +110,7 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -78,6 +78,15 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
 KBUILD_LDFLAGS_MODULE  += -T $(srctree)/arch/arm64/kernel/module.lds
 endif
 
+ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y)
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+  KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+  ifeq ($(call cc-option,-fpatchable-function-entry=2),)
+$(error Cannot use CONFIG_DYNAMIC_FTRACE_WITH_REGS: \
+ -fpatchable-function-entry not supported by compiler)
+  endif
+endif
+
 # Default value
 head-y := arch/arm64/kernel/head.o
 
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -16,6 +16,17 @@
 #define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+/* DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
+   of each function, with the second NOP actually calling ftrace. In contrary
+   to a classic _mcount call, the call instruction to be modified is thus
+   the second one, and not the only one. */
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
+#else
+#define REC_IP_BRANCH_OFFSET 0
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$(
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
 
-CFLAGS_REMOVE_ftrace.o = -pg
-CFLAGS_REMOVE_insn.o = -pg
-CFLAGS_REMOVE_return_address.o = -pg
+CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE)
 
 # Object file lists.
 arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o  
\
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -11,7 +11,8 @@ cflags-$(CONFIG_X86)  += -m$(BITS) -D__K
   -fPIC -fno-strict-aliasing -mno-red-zone \
   -mno-mmx -mno-sse -fshort-wchar
 
-cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie
+cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\
+ ,$(KBUILD_CFLAGS)) -fpie
 cflags-$(CONFIG_ARM)   := $(subst -pg,,$(KBUILD_CFLAGS)) \
   -fno-builtin -fpic -mno-single-pic-base
 
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Gcc with -pg will put the following code in the beginning of each function:
@@ -123,6 +125,7 @@ skip_ftrace_call:   // }
 ENDPROC(_mcount)
 
 #else /* CONFIG_DYNAMIC_FTRACE */
+#ifndef CC_USING_PATCHABLE_FUNCTION_ENTRY
 /*
  * _mcount() is used to build the kernel with -pg option, but all the branch
  * instructions to _mcount() are replaced to NOP initially at kernel start up,
@@ -162,6 +165,92 @@ ftrace_graph_call: // ftrace_graph_cal
 
mcount_exit
 ENDPROC(ftrace_caller)
+#else /* CC_USING_PATCHABLE_FUNCTION_ENTRY */
+
+/* Since no -pg or similar compiler flag is used, there should really be
+   no reference to _mcount; so do not define one. Only a function address
+   is needed in order to refer to it. */
+ENTRY(_mcount)
+   ret /* just in case, prevent any fall through. */
+ENDPROC(_mcount)
+
+ENTRY(ftrace_regs_caller)
+   sub sp, sp, #S_FRAME_SIZE
+   stp x29, x9, [sp, #-16] /* FP/LR link */
+
+   stp x10, x11, [sp, #S_X10]
+   stp x12, x13, [sp, #S_X12]
+   stp x14, x15, [sp, #112]
+   stp x16, x17, [sp, #128]
+   stp x18, x19, [sp, #144]
+   stp x20, x21, [sp, #160]
+   stp x22, x23, [sp, #176]
+   stp x24, x25, [sp, #192]
+   stp x26, x27

[PATCH v3 2/4] arm64: implement ftrace with regs

2018-10-01 Thread Torsten Duwe
Check for compiler support of -fpatchable-function-entry and use it
to intercept functions immediately on entry, saving the LR in x9.
patchable-function-entry in GCC disables IPA-RA, which means ABI
register calling conventions are obeyed *and* scratch registers are
available.
Disable ftracing in efi/libstub, because this triggers cross-section
linker errors now (-pg is disabled already for those files).
Add an ftrace_caller which can handle LR in x9, as well as an
ftrace_regs_caller that additionally writes out a set of pt_regs
for inspection.
Introduce and handle an ftrace_regs_trampoline for module PLTs.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -110,6 +110,7 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -78,6 +78,15 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
 KBUILD_LDFLAGS_MODULE  += -T $(srctree)/arch/arm64/kernel/module.lds
 endif
 
+ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y)
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+  KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+  ifeq ($(call cc-option,-fpatchable-function-entry=2),)
+$(error Cannot use CONFIG_DYNAMIC_FTRACE_WITH_REGS: \
+ -fpatchable-function-entry not supported by compiler)
+  endif
+endif
+
 # Default value
 head-y := arch/arm64/kernel/head.o
 
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -16,6 +16,17 @@
 #define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+/* DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
+   of each function, with the second NOP actually calling ftrace. In contrary
+   to a classic _mcount call, the call instruction to be modified is thus
+   the second one, and not the only one. */
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
+#else
+#define REC_IP_BRANCH_OFFSET 0
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$(
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
 
-CFLAGS_REMOVE_ftrace.o = -pg
-CFLAGS_REMOVE_insn.o = -pg
-CFLAGS_REMOVE_return_address.o = -pg
+CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE)
 
 # Object file lists.
 arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o  
\
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -11,7 +11,8 @@ cflags-$(CONFIG_X86)  += -m$(BITS) -D__K
   -fPIC -fno-strict-aliasing -mno-red-zone \
   -mno-mmx -mno-sse -fshort-wchar
 
-cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie
+cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\
+ ,$(KBUILD_CFLAGS)) -fpie
 cflags-$(CONFIG_ARM)   := $(subst -pg,,$(KBUILD_CFLAGS)) \
   -fno-builtin -fpic -mno-single-pic-base
 
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Gcc with -pg will put the following code in the beginning of each function:
@@ -123,6 +125,7 @@ skip_ftrace_call:   // }
 ENDPROC(_mcount)
 
 #else /* CONFIG_DYNAMIC_FTRACE */
+#ifndef CC_USING_PATCHABLE_FUNCTION_ENTRY
 /*
  * _mcount() is used to build the kernel with -pg option, but all the branch
  * instructions to _mcount() are replaced to NOP initially at kernel start up,
@@ -162,6 +165,92 @@ ftrace_graph_call: // ftrace_graph_cal
 
mcount_exit
 ENDPROC(ftrace_caller)
+#else /* CC_USING_PATCHABLE_FUNCTION_ENTRY */
+
+/* Since no -pg or similar compiler flag is used, there should really be
+   no reference to _mcount; so do not define one. Only a function address
+   is needed in order to refer to it. */
+ENTRY(_mcount)
+   ret /* just in case, prevent any fall through. */
+ENDPROC(_mcount)
+
+ENTRY(ftrace_regs_caller)
+   sub sp, sp, #S_FRAME_SIZE
+   stp x29, x9, [sp, #-16] /* FP/LR link */
+
+   stp x10, x11, [sp, #S_X10]
+   stp x12, x13, [sp, #S_X12]
+   stp x14, x15, [sp, #112]
+   stp x16, x17, [sp, #128]
+   stp x18, x19, [sp, #144]
+   stp x20, x21, [sp, #160]
+   stp x22, x23, [sp, #176]
+   stp x24, x25, [sp, #192]
+   stp x26, x27

[PATCH v3 1/4] DYNAMIC_FTRACE configurable with and without REGS

2018-10-01 Thread Torsten Duwe
In commit 06aeaaeabf69da4, many ftrace-related config options are
consolidated. By accident, I guess, the choice about DYNAMIC_FTRACE
and DYNAMIC_FTRACE_WITH_REGS is no longer available explicitly but
determined by the sole availability on the architecture.

This makes it hard to introduce DYNAMIC_FTRACE_WITH_REGS if it depends
on new compiler features or other new properties of the toolchain
without breaking existing configurations.

This patch turns the def_bool into an actual choice. Should the toolchain
not meet the requirements for _WITH_REGS it can be turned off.

Signed-off-by: Torsten Duwe 


--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -508,9 +508,15 @@ config DYNAMIC_FTRACE
  otherwise has native performance as long as no tracing is active.
 
 config DYNAMIC_FTRACE_WITH_REGS
-   def_bool y
+   bool "Include register content tracking in dynamic ftrace facility"
+   default y
depends on DYNAMIC_FTRACE
depends on HAVE_DYNAMIC_FTRACE_WITH_REGS
+   help
+ This architecture supports the inspection of register contents,
+ as passed between functions, at the dynamic ftrace points.
+ This is also a prerequisite for Kernel Live Patching (KLP).
+ When in doubt, say Y.
 
 config FUNCTION_PROFILER
bool "Kernel function profiler"


[PATCH v3 1/4] DYNAMIC_FTRACE configurable with and without REGS

2018-10-01 Thread Torsten Duwe
In commit 06aeaaeabf69da4, many ftrace-related config options are
consolidated. By accident, I guess, the choice about DYNAMIC_FTRACE
and DYNAMIC_FTRACE_WITH_REGS is no longer available explicitly but
determined by the sole availability on the architecture.

This makes it hard to introduce DYNAMIC_FTRACE_WITH_REGS if it depends
on new compiler features or other new properties of the toolchain
without breaking existing configurations.

This patch turns the def_bool into an actual choice. Should the toolchain
not meet the requirements for _WITH_REGS it can be turned off.

Signed-off-by: Torsten Duwe 


--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -508,9 +508,15 @@ config DYNAMIC_FTRACE
  otherwise has native performance as long as no tracing is active.
 
 config DYNAMIC_FTRACE_WITH_REGS
-   def_bool y
+   bool "Include register content tracking in dynamic ftrace facility"
+   default y
depends on DYNAMIC_FTRACE
depends on HAVE_DYNAMIC_FTRACE_WITH_REGS
+   help
+ This architecture supports the inspection of register contents,
+ as passed between functions, at the dynamic ftrace points.
+ This is also a prerequisite for Kernel Live Patching (KLP).
+ When in doubt, say Y.
 
 config FUNCTION_PROFILER
bool "Kernel function profiler"


[PATCH v3 0/4] arm64 live patching

2018-10-01 Thread Torsten Duwe
Hi all!

Some substantial changes were requested, so I had to shuffle a few
things around. All the bigger changes are in now.

[Changes from v2]:

* ifeq($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y) instead of ifdef

* "fix" commit 06aeaaeabf69da4. (new patch 1)
  Made DYNAMIC_FTRACE_WITH_REGS a real choice. The current situation
  would be that a linux-4.20 kernel on arm64 should be built with
  gcc >= 8; as in this case, as well as all other archs, the "default y"
  works. Only kernels >= 4.20, arm64, gcc < 8, must change this to "n"
  in order to not be stopped by the Makefile $(error) from patch 2/4.
  You'll then fall back to the DYNAMIC_FTRACE, if selected, like before.

* use some S_X* constants to refer to offsets into pt_regs in assembly.

* have the compiler/assembler generate the mov x9,x30 instruction that
  saves LR at compile time, rather than generate it repeatedly at runtime.

* flip the ftrace_regs_caller stack frame so that it is no longer
  upside down, as Ard remarked. This change broke the graph caller somehow.

* extend handling of the module arch-dependent ftrace trampoline with
  a companion "regs" version.
  
* clear the _TIF_PATCH_PENDING on do_notify_resume()

* took care of arch/arm64/kernel/time.c when changing stack unwinder
  semantics

[TODO]

* use more S_X* constants

* run the full livepatch test suite, especially test apply_relocate_add()
  functionality late after module load.

[Changes from v1]:

* Missing compiler support is now a Makefile error, instead
  of a warning. This will keep the compile log shorter and
  it will thus be easier to spot the problem.

* A separate ftrace_regs_caller. Only that one will write out
  a complete pt_regs, for efficiency.

* Replace the use of X19 with X28 to remember the old PC during
  live patch detection, as only that is saved now for
  non-regs ftrace.

* CONFIG_DYNAMIC_FTRACE_WITH_REGS and CC_USING_PATCHABLE_FUNCTION_ENTRY
  are currently synonymous on arm64, but differentiate better for
  the future when this is no longer the case.

* Clean up "old"/"new" insn value setting vs. #ifdefs.

* #define a INSN_MOV_X9_X30 with suggested aarch64_insn_gen call
  and use that instead of an immediate hex value.

Torsten


[PATCH v3 0/4] arm64 live patching

2018-10-01 Thread Torsten Duwe
Hi all!

Some substantial changes were requested, so I had to shuffle a few
things around. All the bigger changes are in now.

[Changes from v2]:

* ifeq($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y) instead of ifdef

* "fix" commit 06aeaaeabf69da4. (new patch 1)
  Made DYNAMIC_FTRACE_WITH_REGS a real choice. The current situation
  would be that a linux-4.20 kernel on arm64 should be built with
  gcc >= 8; as in this case, as well as all other archs, the "default y"
  works. Only kernels >= 4.20, arm64, gcc < 8, must change this to "n"
  in order to not be stopped by the Makefile $(error) from patch 2/4.
  You'll then fall back to the DYNAMIC_FTRACE, if selected, like before.

* use some S_X* constants to refer to offsets into pt_regs in assembly.

* have the compiler/assembler generate the mov x9,x30 instruction that
  saves LR at compile time, rather than generate it repeatedly at runtime.

* flip the ftrace_regs_caller stack frame so that it is no longer
  upside down, as Ard remarked. This change broke the graph caller somehow.

* extend handling of the module arch-dependent ftrace trampoline with
  a companion "regs" version.
  
* clear the _TIF_PATCH_PENDING on do_notify_resume()

* took care of arch/arm64/kernel/time.c when changing stack unwinder
  semantics

[TODO]

* use more S_X* constants

* run the full livepatch test suite, especially test apply_relocate_add()
  functionality late after module load.

[Changes from v1]:

* Missing compiler support is now a Makefile error, instead
  of a warning. This will keep the compile log shorter and
  it will thus be easier to spot the problem.

* A separate ftrace_regs_caller. Only that one will write out
  a complete pt_regs, for efficiency.

* Replace the use of X19 with X28 to remember the old PC during
  live patch detection, as only that is saved now for
  non-regs ftrace.

* CONFIG_DYNAMIC_FTRACE_WITH_REGS and CC_USING_PATCHABLE_FUNCTION_ENTRY
  are currently synonymous on arm64, but differentiate better for
  the future when this is no longer the case.

* Clean up "old"/"new" insn value setting vs. #ifdefs.

* #define a INSN_MOV_X9_X30 with suggested aarch64_insn_gen call
  and use that instead of an immediate hex value.

Torsten


[PATCH v2 3/3] arm64: reliable stacktraces

2018-08-17 Thread Torsten Duwe
This is more an RFC in the original sense: is this basically
the correct approach? (as I had to tweak the save_stack API a bit).

In particular the code does not detect interrupts and exception
frames, and does not yet check whether the code address is valid.
The latter check would also have to be omitted for the latest frame
on other tasks' stacks. This would require some more tweaking.

unwind_frame() now reports whether we had to stop normally or due to
an error condition; walk_stackframe() will pass that info.
__save_stack_trace() is used for a start to check the validity of a
frame; maybe save_stack_trace_tsk_reliable() will need its own callback.

The question is whether this change, once complete, is sufficient
(as on powerpc) or whether a port of objtool is needed, like x86.

I can dig into this myself and draw conclusions, but I'd prefer
to have some input from ARM people here...

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -127,8 +127,9 @@ config ARM64
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
-   select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RCU_TABLE_FREE
+   select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_RELIABLE_STACKTRACE
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KPROBES
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -33,7 +33,7 @@ struct stackframe {
 };
 
 extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
-extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+extern int walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
int (*fn)(struct stackframe *, void *), void *data);
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk);
 
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index d5718a060672..fe0dd4745ff3 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -81,23 +81,27 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
 * both are NULL.
 */
if (!frame->fp && !frame->pc)
-   return -EINVAL;
+   return 1;
 
return 0;
 }
 
-void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+int notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
 int (*fn)(struct stackframe *, void *), void *data)
 {
while (1) {
int ret;
 
-   if (fn(frame, data))
-   break;
+   ret = fn(frame, data);
+   if (ret)
+   return ret;
ret = unwind_frame(tsk, frame);
if (ret < 0)
+   return ret;
+   if (ret > 0)
break;
}
+   return 0;
 }
 
 #ifdef CONFIG_STACKTRACE
@@ -145,14 +149,15 @@ void save_stack_trace_regs(struct pt_regs *regs, struct 
stack_trace *trace)
trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
 
-static noinline void __save_stack_trace(struct task_struct *tsk,
+static noinline int __save_stack_trace(struct task_struct *tsk,
struct stack_trace *trace, unsigned int nosched)
 {
struct stack_trace_data data;
struct stackframe frame;
+   int ret;
 
if (!try_get_task_stack(tsk))
-   return;
+   return -EBUSY;
 
data.trace = trace;
data.skip = trace->skip;
@@ -171,11 +176,12 @@ static noinline void __save_stack_trace(struct 
task_struct *tsk,
frame.graph = tsk->curr_ret_stack;
 #endif
 
-   walk_stackframe(tsk, , save_trace, );
+   ret = walk_stackframe(tsk, , save_trace, );
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
 
put_task_stack(tsk);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
@@ -190,4 +196,12 @@ void save_stack_trace(struct stack_trace *trace)
 }
 
 EXPORT_SYMBOL_GPL(save_stack_trace);
+
+int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+   return __save_stack_trace(tsk, trace, 1);
+}
+EXPORT_SYMBOL_GPL(save_stack_trace_tsk_reliable);
+
 #endif



[PATCH v2 3/3] arm64: reliable stacktraces

2018-08-17 Thread Torsten Duwe
This is more an RFC in the original sense: is this basically
the correct approach? (as I had to tweak the save_stack API a bit).

In particular the code does not detect interrupts and exception
frames, and does not yet check whether the code address is valid.
The latter check would also have to be omitted for the latest frame
on other tasks' stacks. This would require some more tweaking.

unwind_frame() now reports whether we had to stop normally or due to
an error condition; walk_stackframe() will pass that info.
__save_stack_trace() is used for a start to check the validity of a
frame; maybe save_stack_trace_tsk_reliable() will need its own callback.

The question is whether this change, once complete, is sufficient
(as on powerpc) or whether a port of objtool is needed, like x86.

I can dig into this myself and draw conclusions, but I'd prefer
to have some input from ARM people here...

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -127,8 +127,9 @@ config ARM64
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
-   select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RCU_TABLE_FREE
+   select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_RELIABLE_STACKTRACE
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KPROBES
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -33,7 +33,7 @@ struct stackframe {
 };
 
 extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
-extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+extern int walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
int (*fn)(struct stackframe *, void *), void *data);
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk);
 
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index d5718a060672..fe0dd4745ff3 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -81,23 +81,27 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
 * both are NULL.
 */
if (!frame->fp && !frame->pc)
-   return -EINVAL;
+   return 1;
 
return 0;
 }
 
-void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+int notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
 int (*fn)(struct stackframe *, void *), void *data)
 {
while (1) {
int ret;
 
-   if (fn(frame, data))
-   break;
+   ret = fn(frame, data);
+   if (ret)
+   return ret;
ret = unwind_frame(tsk, frame);
if (ret < 0)
+   return ret;
+   if (ret > 0)
break;
}
+   return 0;
 }
 
 #ifdef CONFIG_STACKTRACE
@@ -145,14 +149,15 @@ void save_stack_trace_regs(struct pt_regs *regs, struct 
stack_trace *trace)
trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
 
-static noinline void __save_stack_trace(struct task_struct *tsk,
+static noinline int __save_stack_trace(struct task_struct *tsk,
struct stack_trace *trace, unsigned int nosched)
 {
struct stack_trace_data data;
struct stackframe frame;
+   int ret;
 
if (!try_get_task_stack(tsk))
-   return;
+   return -EBUSY;
 
data.trace = trace;
data.skip = trace->skip;
@@ -171,11 +176,12 @@ static noinline void __save_stack_trace(struct 
task_struct *tsk,
frame.graph = tsk->curr_ret_stack;
 #endif
 
-   walk_stackframe(tsk, , save_trace, );
+   ret = walk_stackframe(tsk, , save_trace, );
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
 
put_task_stack(tsk);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
@@ -190,4 +196,12 @@ void save_stack_trace(struct stack_trace *trace)
 }
 
 EXPORT_SYMBOL_GPL(save_stack_trace);
+
+int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+   return __save_stack_trace(tsk, trace, 1);
+}
+EXPORT_SYMBOL_GPL(save_stack_trace_tsk_reliable);
+
 #endif



[PATCH v2 1/3] arm64: implement ftrace with regs

2018-08-17 Thread Torsten Duwe
Check for compiler support of -fpatchable-function-entry and use it
to intercept functions immediately on entry, saving the LR in x9.
Disable ftracing in efi/libstub, because this triggers cross-section
linker errors now (-pg is disabled already for those files).
Add an ftrace_caller which can handle LR in x9, as well as an
ftrace_regs_caller that additionally writes out a set of pt_regs
for inspection.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -110,6 +110,7 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -78,6 +78,15 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
 KBUILD_LDFLAGS_MODULE  += -T $(srctree)/arch/arm64/kernel/module.lds
 endif
 
+ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+  KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+  ifeq ($(call cc-option,-fpatchable-function-entry=2),)
+$(error Cannot use CONFIG_DYNAMIC_FTRACE_WITH_REGS: \
+ -fpatchable-function-entry not supported by compiler)
+  endif
+endif
+
 # Default value
 head-y := arch/arm64/kernel/head.o
 
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -16,6 +16,13 @@
 #define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#define REC_IP_BRANCH_OFFSET 4
+#else
+#define REC_IP_BRANCH_OFFSET 0
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$(
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
 
-CFLAGS_REMOVE_ftrace.o = -pg
-CFLAGS_REMOVE_insn.o = -pg
-CFLAGS_REMOVE_return_address.o = -pg
+CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE)
 
 # Object file lists.
 arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o  
\
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -11,7 +11,8 @@ cflags-$(CONFIG_X86)  += -m$(BITS) -D__K
   -fPIC -fno-strict-aliasing -mno-red-zone \
   -mno-mmx -mno-sse -fshort-wchar
 
-cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie
+cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\
+ ,$(KBUILD_CFLAGS)) -fpie
 cflags-$(CONFIG_ARM)   := $(subst -pg,,$(KBUILD_CFLAGS)) \
   -fno-builtin -fpic -mno-single-pic-base
 
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Gcc with -pg will put the following code in the beginning of each function:
@@ -123,6 +125,7 @@ skip_ftrace_call:   // }
 ENDPROC(_mcount)
 
 #else /* CONFIG_DYNAMIC_FTRACE */
+#ifndef CC_USING_PATCHABLE_FUNCTION_ENTRY
 /*
  * _mcount() is used to build the kernel with -pg option, but all the branch
  * instructions to _mcount() are replaced to NOP initially at kernel start up,
@@ -162,6 +165,88 @@ ftrace_graph_call: // ftrace_graph_cal
 
mcount_exit
 ENDPROC(ftrace_caller)
+#else /* CC_USING_PATCHABLE_FUNCTION_ENTRY */
+ENTRY(_mcount)
+   mov x10, lr
+   mov lr, x9
+   ret x10
+ENDPROC(_mcount)
+
+ENTRY(ftrace_regs_caller)
+   stp x29, x9, [sp, #-16]!
+   sub sp, sp, #S_FRAME_SIZE
+
+   stp x10, x11, [sp, #80]
+   stp x12, x13, [sp, #96]
+   stp x14, x15, [sp, #112]
+   stp x16, x17, [sp, #128]
+   stp x18, x19, [sp, #144]
+   stp x20, x21, [sp, #160]
+   stp x22, x23, [sp, #176]
+   stp x24, x25, [sp, #192]
+   stp x26, x27, [sp, #208]
+
+   b   ftrace_common
+ENDPROC(ftrace_regs_caller)
+
+ENTRY(ftrace_caller)
+   stp x29, x9, [sp, #-16]!
+   sub sp, sp, #S_FRAME_SIZE
+
+ftrace_common:
+   stp x28, x29, [sp, #224]
+
+   /* save function arguments */
+   stp x0, x1, [sp]
+   stp x2, x3, [sp, #16]
+   stp x4, x5, [sp, #32]
+   stp x6, x7, [sp, #48]
+   stp x8, x9, [sp, #64]
+
+   /* The link Register at callee entry */
+   str x9, [sp, #S_LR]
+   /* The program counter just after the ftrace call site */
+   str lr, [sp, #S_PC]
+   /* The stack pointer as it was on ftrace_caller entry... */
+   add x29, sp, #S_FRAME_SIZE+16

[PATCH v2 1/3] arm64: implement ftrace with regs

2018-08-17 Thread Torsten Duwe
Check for compiler support of -fpatchable-function-entry and use it
to intercept functions immediately on entry, saving the LR in x9.
Disable ftracing in efi/libstub, because this triggers cross-section
linker errors now (-pg is disabled already for those files).
Add an ftrace_caller which can handle LR in x9, as well as an
ftrace_regs_caller that additionally writes out a set of pt_regs
for inspection.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -110,6 +110,7 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -78,6 +78,15 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
 KBUILD_LDFLAGS_MODULE  += -T $(srctree)/arch/arm64/kernel/module.lds
 endif
 
+ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+  KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+  ifeq ($(call cc-option,-fpatchable-function-entry=2),)
+$(error Cannot use CONFIG_DYNAMIC_FTRACE_WITH_REGS: \
+ -fpatchable-function-entry not supported by compiler)
+  endif
+endif
+
 # Default value
 head-y := arch/arm64/kernel/head.o
 
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -16,6 +16,13 @@
 #define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#define REC_IP_BRANCH_OFFSET 4
+#else
+#define REC_IP_BRANCH_OFFSET 0
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$(
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
 
-CFLAGS_REMOVE_ftrace.o = -pg
-CFLAGS_REMOVE_insn.o = -pg
-CFLAGS_REMOVE_return_address.o = -pg
+CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE)
 
 # Object file lists.
 arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o  
\
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -11,7 +11,8 @@ cflags-$(CONFIG_X86)  += -m$(BITS) -D__K
   -fPIC -fno-strict-aliasing -mno-red-zone \
   -mno-mmx -mno-sse -fshort-wchar
 
-cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie
+cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\
+ ,$(KBUILD_CFLAGS)) -fpie
 cflags-$(CONFIG_ARM)   := $(subst -pg,,$(KBUILD_CFLAGS)) \
   -fno-builtin -fpic -mno-single-pic-base
 
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Gcc with -pg will put the following code in the beginning of each function:
@@ -123,6 +125,7 @@ skip_ftrace_call:   // }
 ENDPROC(_mcount)
 
 #else /* CONFIG_DYNAMIC_FTRACE */
+#ifndef CC_USING_PATCHABLE_FUNCTION_ENTRY
 /*
  * _mcount() is used to build the kernel with -pg option, but all the branch
  * instructions to _mcount() are replaced to NOP initially at kernel start up,
@@ -162,6 +165,88 @@ ftrace_graph_call: // ftrace_graph_cal
 
mcount_exit
 ENDPROC(ftrace_caller)
+#else /* CC_USING_PATCHABLE_FUNCTION_ENTRY */
+ENTRY(_mcount)
+   mov x10, lr
+   mov lr, x9
+   ret x10
+ENDPROC(_mcount)
+
+ENTRY(ftrace_regs_caller)
+   stp x29, x9, [sp, #-16]!
+   sub sp, sp, #S_FRAME_SIZE
+
+   stp x10, x11, [sp, #80]
+   stp x12, x13, [sp, #96]
+   stp x14, x15, [sp, #112]
+   stp x16, x17, [sp, #128]
+   stp x18, x19, [sp, #144]
+   stp x20, x21, [sp, #160]
+   stp x22, x23, [sp, #176]
+   stp x24, x25, [sp, #192]
+   stp x26, x27, [sp, #208]
+
+   b   ftrace_common
+ENDPROC(ftrace_regs_caller)
+
+ENTRY(ftrace_caller)
+   stp x29, x9, [sp, #-16]!
+   sub sp, sp, #S_FRAME_SIZE
+
+ftrace_common:
+   stp x28, x29, [sp, #224]
+
+   /* save function arguments */
+   stp x0, x1, [sp]
+   stp x2, x3, [sp, #16]
+   stp x4, x5, [sp, #32]
+   stp x6, x7, [sp, #48]
+   stp x8, x9, [sp, #64]
+
+   /* The link Register at callee entry */
+   str x9, [sp, #S_LR]
+   /* The program counter just after the ftrace call site */
+   str lr, [sp, #S_PC]
+   /* The stack pointer as it was on ftrace_caller entry... */
+   add x29, sp, #S_FRAME_SIZE+16

[PATCH v2 2/3] arm64: implement live patching

2018-08-17 Thread Torsten Duwe
Based on ftrace with regs, do the usual thing. Also allocate a
task flag for whatever consistency handling will be used.
Watch out for interactions with the graph tracer.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -119,6 +119,7 @@ config ARM64
select HAVE_GENERIC_DMA_COHERENT
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING
+   select HAVE_LIVEPATCH
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP if NUMA
select HAVE_NMI
@@ -1349,4 +1350,6 @@ if CRYPTO
 source "arch/arm64/crypto/Kconfig"
 endif
 
+source "kernel/livepatch/Kconfig"
+
 source "lib/Kconfig"
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -76,6 +76,7 @@ void arch_release_task_struct(struct tas
 #define TIF_FOREIGN_FPSTATE3   /* CPU's FP state is not current's */
 #define TIF_UPROBE 4   /* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK5   /* Check FS is USER_DS on return */
+#define TIF_PATCH_PENDING  6
 #define TIF_NOHZ   7
 #define TIF_SYSCALL_TRACE  8
 #define TIF_SYSCALL_AUDIT  9
@@ -94,6 +95,7 @@ void arch_release_task_struct(struct tas
 #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
 #define _TIF_FOREIGN_FPSTATE   (1 << TIF_FOREIGN_FPSTATE)
+#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING)
 #define _TIF_NOHZ  (1 << TIF_NOHZ)
 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
 #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
@@ -106,7 +108,8 @@ void arch_release_task_struct(struct tas
 
 #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-_TIF_UPROBE | _TIF_FSCHECK)
+_TIF_UPROBE | _TIF_FSCHECK | \
+_TIF_PATCH_PENDING)
 
 #define _TIF_SYSCALL_WORK  (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
--- /dev/null
+++ b/arch/arm64/include/asm/livepatch.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * livepatch.h - arm64-specific Kernel Live Patching Core
+ *
+ * Copyright (C) 2016,2018 SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef _ASM_ARM64_LIVEPATCH_H
+#define _ASM_ARM64_LIVEPATCH_H
+
+#include 
+#include 
+
+#ifdef CONFIG_LIVEPATCH
+static inline int klp_check_compiler_support(void)
+{
+   return 0;
+}
+
+static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
+{
+   regs->pc = ip;
+}
+#endif /* CONFIG_LIVEPATCH */
+
+#endif /* _ASM_ARM64_LIVEPATCH_H */
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -209,6 +209,9 @@ ftrace_common:
str x9, [sp, #S_LR]
/* The program counter just after the ftrace call site */
str lr, [sp, #S_PC]
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   mov x28,lr  /* remember old return address */
+#endif
/* The stack pointer as it was on ftrace_caller entry... */
add x29, sp, #S_FRAME_SIZE+16   /* ...is also our new FP */
str x29, [sp, #S_SP]
@@ -224,6 +227,16 @@ ftrace_call:
 
bl  ftrace_stub
 
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   /* Is the trace function a live patcher an has messed with
+* the return address?
+   */
+   ldr x9, [sp, #S_PC]
+   cmp x9, x28 /* compare with the value we remembered */
+   /* to not call graph tracer's "call" mechanism twice! */
+   b.neftrace_common_return
+#endif
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
.global ftrace_graph_call
 ftrace_graph_call: // ftrace_graph_caller();


[PATCH v2 2/3] arm64: implement live patching

2018-08-17 Thread Torsten Duwe
Based on ftrace with regs, do the usual thing. Also allocate a
task flag for whatever consistency handling will be used.
Watch out for interactions with the graph tracer.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -119,6 +119,7 @@ config ARM64
select HAVE_GENERIC_DMA_COHERENT
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING
+   select HAVE_LIVEPATCH
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP if NUMA
select HAVE_NMI
@@ -1349,4 +1350,6 @@ if CRYPTO
 source "arch/arm64/crypto/Kconfig"
 endif
 
+source "kernel/livepatch/Kconfig"
+
 source "lib/Kconfig"
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -76,6 +76,7 @@ void arch_release_task_struct(struct tas
 #define TIF_FOREIGN_FPSTATE3   /* CPU's FP state is not current's */
 #define TIF_UPROBE 4   /* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK5   /* Check FS is USER_DS on return */
+#define TIF_PATCH_PENDING  6
 #define TIF_NOHZ   7
 #define TIF_SYSCALL_TRACE  8
 #define TIF_SYSCALL_AUDIT  9
@@ -94,6 +95,7 @@ void arch_release_task_struct(struct tas
 #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
 #define _TIF_FOREIGN_FPSTATE   (1 << TIF_FOREIGN_FPSTATE)
+#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING)
 #define _TIF_NOHZ  (1 << TIF_NOHZ)
 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
 #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
@@ -106,7 +108,8 @@ void arch_release_task_struct(struct tas
 
 #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-_TIF_UPROBE | _TIF_FSCHECK)
+_TIF_UPROBE | _TIF_FSCHECK | \
+_TIF_PATCH_PENDING)
 
 #define _TIF_SYSCALL_WORK  (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
--- /dev/null
+++ b/arch/arm64/include/asm/livepatch.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * livepatch.h - arm64-specific Kernel Live Patching Core
+ *
+ * Copyright (C) 2016,2018 SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef _ASM_ARM64_LIVEPATCH_H
+#define _ASM_ARM64_LIVEPATCH_H
+
+#include 
+#include 
+
+#ifdef CONFIG_LIVEPATCH
+static inline int klp_check_compiler_support(void)
+{
+   return 0;
+}
+
+static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
+{
+   regs->pc = ip;
+}
+#endif /* CONFIG_LIVEPATCH */
+
+#endif /* _ASM_ARM64_LIVEPATCH_H */
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -209,6 +209,9 @@ ftrace_common:
str x9, [sp, #S_LR]
/* The program counter just after the ftrace call site */
str lr, [sp, #S_PC]
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   mov x28,lr  /* remember old return address */
+#endif
/* The stack pointer as it was on ftrace_caller entry... */
add x29, sp, #S_FRAME_SIZE+16   /* ...is also our new FP */
str x29, [sp, #S_SP]
@@ -224,6 +227,16 @@ ftrace_call:
 
bl  ftrace_stub
 
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   /* Is the trace function a live patcher an has messed with
+* the return address?
+   */
+   ldr x9, [sp, #S_PC]
+   cmp x9, x28 /* compare with the value we remembered */
+   /* to not call graph tracer's "call" mechanism twice! */
+   b.neftrace_common_return
+#endif
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
.global ftrace_graph_call
 ftrace_graph_call: // ftrace_graph_caller();


[PATCH v2 0/3] arm64 live patching

2018-08-17 Thread Torsten Duwe
Hi all!

Here's v2; I hope I have incorporated all feedback properly.
Julien: #S_X28 + 8 is in, but ftrace_modify_call() is referenced
from generic code: kernel/trace/ftrace.c __ftrace_replace_code()

And I'd *really* like some feedback from ARM/Linaro/... on the stack
unwinder!

[Changes from v1]:

* Missing compiler support is now a Makefile error, instead
  of a warning. This will keep the compile log shorter and
  it will thus be easier to spot the problem.

* A separate ftrace_regs_caller. Only that one will write out
  a complete pt_regs, for efficiency.

* Replace the use of X19 with X28 to remember the old PC during
  live patch detection, as only that is saved now for
  non-regs ftrace.

* CONFIG_DYNAMIC_FTRACE_WITH_REGS and CC_USING_PATCHABLE_FUNCTION_ENTRY
  are currently synonymous on arm64, but differentiate better for
  the future when this is no longer the case.

* Clean up "old"/"new" insn value setting vs. #ifdefs.

* #define a INSN_MOV_X9_X30 with suggested aarch64_insn_gen call
  and use that instead of an immediate hex value.

Torsten


[PATCH v2 0/3] arm64 live patching

2018-08-17 Thread Torsten Duwe
Hi all!

Here's v2; I hope I have incorporated all feedback properly.
Julien: #S_X28 + 8 is in, but ftrace_modify_call() is referenced
from generic code: kernel/trace/ftrace.c __ftrace_replace_code()

And I'd *really* like some feedback from ARM/Linaro/... on the stack
unwinder!

[Changes from v1]:

* Missing compiler support is now a Makefile error, instead
  of a warning. This will keep the compile log shorter and
  it will thus be easier to spot the problem.

* A separate ftrace_regs_caller. Only that one will write out
  a complete pt_regs, for efficiency.

* Replace the use of X19 with X28 to remember the old PC during
  live patch detection, as only that is saved now for
  non-regs ftrace.

* CONFIG_DYNAMIC_FTRACE_WITH_REGS and CC_USING_PATCHABLE_FUNCTION_ENTRY
  are currently synonymous on arm64, but differentiate better for
  the future when this is no longer the case.

* Clean up "old"/"new" insn value setting vs. #ifdefs.

* #define a INSN_MOV_X9_X30 with suggested aarch64_insn_gen call
  and use that instead of an immediate hex value.

Torsten


Re: [PATCH 1/3] arm64: implement ftrace with regs

2018-08-15 Thread Torsten Duwe
[working on V2 with your feedback]

On Tue, Aug 14, 2018 at 12:04:33PM -0400, Steven Rostedt wrote:
> On Tue, 14 Aug 2018 09:33:52 +0100
> Julien Thierry  wrote:
> > >> Shouldn't this be an error? The option -fpatchable-function-entry has
> > >> been added to the CC_FLAGS_FTRACE, so any call to the compiler is gonna
> > >> break anyway. Or am I missing something?  

This should be the case.

> OK, I see what you mean. If the resulting build wont boot, then yes
> this should be an error and not a warning.

No, there won't be a binary because the first gcc invocation with
CC_FLAGS_FTRACE will error out.

The alternatives are a makefile-warning followed by a gcc-error or just
a makefile-error. The makefile warning or error should hint at the causing
config option, that's the key point. Beyond that I don't have any preference.

Torsten


Re: [PATCH 1/3] arm64: implement ftrace with regs

2018-08-15 Thread Torsten Duwe
[working on V2 with your feedback]

On Tue, Aug 14, 2018 at 12:04:33PM -0400, Steven Rostedt wrote:
> On Tue, 14 Aug 2018 09:33:52 +0100
> Julien Thierry  wrote:
> > >> Shouldn't this be an error? The option -fpatchable-function-entry has
> > >> been added to the CC_FLAGS_FTRACE, so any call to the compiler is gonna
> > >> break anyway. Or am I missing something?  

This should be the case.

> OK, I see what you mean. If the resulting build wont boot, then yes
> this should be an error and not a warning.

No, there won't be a binary because the first gcc invocation with
CC_FLAGS_FTRACE will error out.

The alternatives are a makefile-warning followed by a gcc-error or just
a makefile-error. The makefile warning or error should hint at the causing
config option, that's the key point. Beyond that I don't have any preference.

Torsten


[PATCH 3/3] arm64: reliable stacktraces

2018-08-10 Thread Torsten Duwe
This is more an RFC in the original sense: is this basically
the correct approach? (as I had to tweak the API a bit).

In particular the code does not detect interrupts and exception
frames, and does not yet check whether the code address is valid.
The latter check would also have to be omitted for the latest frame
on other tasks' stacks. This would require some more tweaking.

unwind_frame() now reports whether we had to stop normally or due to
an error condition; walk_stackframe() will pass that info.
__save_stack_trace() is used for a start to check the validity of a
frame; maybe save_stack_trace_tsk_reliable() will need its own callback.

Any comments welcome.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -127,8 +127,9 @@ config ARM64
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
-   select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RCU_TABLE_FREE
+   select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_RELIABLE_STACKTRACE
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KPROBES
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -33,7 +33,7 @@ struct stackframe {
 };
 
 extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
-extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+extern int walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
int (*fn)(struct stackframe *, void *), void *data);
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk);
 
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index d5718a060672..fe0dd4745ff3 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -81,23 +81,27 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
 * both are NULL.
 */
if (!frame->fp && !frame->pc)
-   return -EINVAL;
+   return 1;
 
return 0;
 }
 
-void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+int notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
 int (*fn)(struct stackframe *, void *), void *data)
 {
while (1) {
int ret;
 
-   if (fn(frame, data))
-   break;
+   ret = fn(frame, data);
+   if (ret)
+   return ret;
ret = unwind_frame(tsk, frame);
if (ret < 0)
+   return ret;
+   if (ret > 0)
break;
}
+   return 0;
 }
 
 #ifdef CONFIG_STACKTRACE
@@ -145,14 +149,15 @@ void save_stack_trace_regs(struct pt_regs *regs, struct 
stack_trace *trace)
trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
 
-static noinline void __save_stack_trace(struct task_struct *tsk,
+static noinline int __save_stack_trace(struct task_struct *tsk,
struct stack_trace *trace, unsigned int nosched)
 {
struct stack_trace_data data;
struct stackframe frame;
+   int ret;
 
if (!try_get_task_stack(tsk))
-   return;
+   return -EBUSY;
 
data.trace = trace;
data.skip = trace->skip;
@@ -171,11 +176,12 @@ static noinline void __save_stack_trace(struct 
task_struct *tsk,
frame.graph = tsk->curr_ret_stack;
 #endif
 
-   walk_stackframe(tsk, , save_trace, );
+   ret = walk_stackframe(tsk, , save_trace, );
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
 
put_task_stack(tsk);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
@@ -190,4 +196,12 @@ void save_stack_trace(struct stack_trace *trace)
 }
 
 EXPORT_SYMBOL_GPL(save_stack_trace);
+
+int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+   return __save_stack_trace(tsk, trace, 1);
+}
+EXPORT_SYMBOL_GPL(save_stack_trace_tsk_reliable);
+
 #endif



[PATCH 3/3] arm64: reliable stacktraces

2018-08-10 Thread Torsten Duwe
This is more an RFC in the original sense: is this basically
the correct approach? (as I had to tweak the API a bit).

In particular the code does not detect interrupts and exception
frames, and does not yet check whether the code address is valid.
The latter check would also have to be omitted for the latest frame
on other tasks' stacks. This would require some more tweaking.

unwind_frame() now reports whether we had to stop normally or due to
an error condition; walk_stackframe() will pass that info.
__save_stack_trace() is used for a start to check the validity of a
frame; maybe save_stack_trace_tsk_reliable() will need its own callback.

Any comments welcome.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -127,8 +127,9 @@ config ARM64
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
-   select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RCU_TABLE_FREE
+   select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_RELIABLE_STACKTRACE
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KPROBES
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -33,7 +33,7 @@ struct stackframe {
 };
 
 extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
-extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+extern int walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
int (*fn)(struct stackframe *, void *), void *data);
 extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk);
 
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index d5718a060672..fe0dd4745ff3 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -81,23 +81,27 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
 * both are NULL.
 */
if (!frame->fp && !frame->pc)
-   return -EINVAL;
+   return 1;
 
return 0;
 }
 
-void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
+int notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
 int (*fn)(struct stackframe *, void *), void *data)
 {
while (1) {
int ret;
 
-   if (fn(frame, data))
-   break;
+   ret = fn(frame, data);
+   if (ret)
+   return ret;
ret = unwind_frame(tsk, frame);
if (ret < 0)
+   return ret;
+   if (ret > 0)
break;
}
+   return 0;
 }
 
 #ifdef CONFIG_STACKTRACE
@@ -145,14 +149,15 @@ void save_stack_trace_regs(struct pt_regs *regs, struct 
stack_trace *trace)
trace->entries[trace->nr_entries++] = ULONG_MAX;
 }
 
-static noinline void __save_stack_trace(struct task_struct *tsk,
+static noinline int __save_stack_trace(struct task_struct *tsk,
struct stack_trace *trace, unsigned int nosched)
 {
struct stack_trace_data data;
struct stackframe frame;
+   int ret;
 
if (!try_get_task_stack(tsk))
-   return;
+   return -EBUSY;
 
data.trace = trace;
data.skip = trace->skip;
@@ -171,11 +176,12 @@ static noinline void __save_stack_trace(struct 
task_struct *tsk,
frame.graph = tsk->curr_ret_stack;
 #endif
 
-   walk_stackframe(tsk, , save_trace, );
+   ret = walk_stackframe(tsk, , save_trace, );
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
 
put_task_stack(tsk);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
@@ -190,4 +196,12 @@ void save_stack_trace(struct stack_trace *trace)
 }
 
 EXPORT_SYMBOL_GPL(save_stack_trace);
+
+int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+   return __save_stack_trace(tsk, trace, 1);
+}
+EXPORT_SYMBOL_GPL(save_stack_trace_tsk_reliable);
+
 #endif



[PATCH 2/3] arm64: implement live patching

2018-08-10 Thread Torsten Duwe
Based on ftrace with regs, do the usual thing. Also allocate a
task flag for whatever consistency handling is implemented.
Watch out for interactions with the graph tracer.
This code has been compile-tested, but has not yet seen any
heavy livepatching.

Signed-off-by: Torsten Duwe 

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 28c8c03..31df287 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -117,6 +117,7 @@ config ARM64
select HAVE_GENERIC_DMA_COHERENT
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING
+   select HAVE_LIVEPATCH
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP if NUMA
select HAVE_NMI
@@ -1343,4 +1344,6 @@ if CRYPTO
 source "arch/arm64/crypto/Kconfig"
 endif
 
+source "kernel/livepatch/Kconfig"
+
 source "lib/Kconfig"
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -76,6 +76,7 @@ void arch_release_task_struct(struct tas
 #define TIF_FOREIGN_FPSTATE3   /* CPU's FP state is not current's */
 #define TIF_UPROBE 4   /* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK5   /* Check FS is USER_DS on return */
+#define TIF_PATCH_PENDING  6
 #define TIF_NOHZ   7
 #define TIF_SYSCALL_TRACE  8
 #define TIF_SYSCALL_AUDIT  9
@@ -94,6 +95,7 @@ void arch_release_task_struct(struct tas
 #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
 #define _TIF_FOREIGN_FPSTATE   (1 << TIF_FOREIGN_FPSTATE)
+#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING)
 #define _TIF_NOHZ  (1 << TIF_NOHZ)
 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
 #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
@@ -106,7 +108,8 @@ void arch_release_task_struct(struct tas
 
 #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-_TIF_UPROBE | _TIF_FSCHECK)
+_TIF_UPROBE | _TIF_FSCHECK | \
+_TIF_PATCH_PENDING)
 
 #define _TIF_SYSCALL_WORK  (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/include/asm/livepatch.h 
b/arch/arm64/include/asm/livepatch.h
new file mode 100644
index 000..6b9a3d1
--- /dev/null
+++ b/arch/arm64/include/asm/livepatch.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * livepatch.h - arm64-specific Kernel Live Patching Core
+ *
+ * Copyright (C) 2016 SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef _ASM_ARM64_LIVEPATCH_H
+#define _ASM_ARM64_LIVEPATCH_H
+
+#include 
+#include 
+
+#ifdef CONFIG_LIVEPATCH
+static inline int klp_check_compiler_support(void)
+{
+   return 0;
+}
+
+static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
+{
+   regs->pc = ip;
+}
+#endif /* CONFIG_LIVEPATCH */
+
+#endif /* _ASM_ARM64_LIVEPATCH_H */
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index cdb5866..cf640e8 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -195,6 +195,9 @@ ENTRY(ftrace_caller)
str x9, [sp, #S_LR]
/* The program counter just after the ftrace call site */
str lr, [sp, #S_PC]
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   mov x19,lr  /* remember old return address */
+#endif
/* The stack pointer as it was on ftrace_caller entry... */
add x29, sp, #S_FRAME_SIZE+16   /* ...is also our new FP */
str x29, [sp, #S_SP]
@@ -210,6 +213,16 @@ ftrace_call:
 
bl  ftrace_stub
 
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   /* Is the trace function a live patcher an has messed with
+* the return address?
+   */
+   ldr x9, [sp, #S_PC]
+   cmp x9, x19 /* compare with the value we remembered */
+   /* to not call graph tracer's "call" mechanism twice! */
+   b.neftrace_regs

[PATCH 2/3] arm64: implement live patching

2018-08-10 Thread Torsten Duwe
Based on ftrace with regs, do the usual thing. Also allocate a
task flag for whatever consistency handling is implemented.
Watch out for interactions with the graph tracer.
This code has been compile-tested, but has not yet seen any
heavy livepatching.

Signed-off-by: Torsten Duwe 

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 28c8c03..31df287 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -117,6 +117,7 @@ config ARM64
select HAVE_GENERIC_DMA_COHERENT
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING
+   select HAVE_LIVEPATCH
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP if NUMA
select HAVE_NMI
@@ -1343,4 +1344,6 @@ if CRYPTO
 source "arch/arm64/crypto/Kconfig"
 endif
 
+source "kernel/livepatch/Kconfig"
+
 source "lib/Kconfig"
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -76,6 +76,7 @@ void arch_release_task_struct(struct tas
 #define TIF_FOREIGN_FPSTATE3   /* CPU's FP state is not current's */
 #define TIF_UPROBE 4   /* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK5   /* Check FS is USER_DS on return */
+#define TIF_PATCH_PENDING  6
 #define TIF_NOHZ   7
 #define TIF_SYSCALL_TRACE  8
 #define TIF_SYSCALL_AUDIT  9
@@ -94,6 +95,7 @@ void arch_release_task_struct(struct tas
 #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
 #define _TIF_FOREIGN_FPSTATE   (1 << TIF_FOREIGN_FPSTATE)
+#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING)
 #define _TIF_NOHZ  (1 << TIF_NOHZ)
 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
 #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
@@ -106,7 +108,8 @@ void arch_release_task_struct(struct tas
 
 #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-_TIF_UPROBE | _TIF_FSCHECK)
+_TIF_UPROBE | _TIF_FSCHECK | \
+_TIF_PATCH_PENDING)
 
 #define _TIF_SYSCALL_WORK  (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/include/asm/livepatch.h 
b/arch/arm64/include/asm/livepatch.h
new file mode 100644
index 000..6b9a3d1
--- /dev/null
+++ b/arch/arm64/include/asm/livepatch.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * livepatch.h - arm64-specific Kernel Live Patching Core
+ *
+ * Copyright (C) 2016 SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef _ASM_ARM64_LIVEPATCH_H
+#define _ASM_ARM64_LIVEPATCH_H
+
+#include 
+#include 
+
+#ifdef CONFIG_LIVEPATCH
+static inline int klp_check_compiler_support(void)
+{
+   return 0;
+}
+
+static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
+{
+   regs->pc = ip;
+}
+#endif /* CONFIG_LIVEPATCH */
+
+#endif /* _ASM_ARM64_LIVEPATCH_H */
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index cdb5866..cf640e8 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -195,6 +195,9 @@ ENTRY(ftrace_caller)
str x9, [sp, #S_LR]
/* The program counter just after the ftrace call site */
str lr, [sp, #S_PC]
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   mov x19,lr  /* remember old return address */
+#endif
/* The stack pointer as it was on ftrace_caller entry... */
add x29, sp, #S_FRAME_SIZE+16   /* ...is also our new FP */
str x29, [sp, #S_SP]
@@ -210,6 +213,16 @@ ftrace_call:
 
bl  ftrace_stub
 
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
+   /* Is the trace function a live patcher an has messed with
+* the return address?
+   */
+   ldr x9, [sp, #S_PC]
+   cmp x9, x19 /* compare with the value we remembered */
+   /* to not call graph tracer's "call" mechanism twice! */
+   b.neftrace_regs

[PATCH 1/3] arm64: implement ftrace with regs

2018-08-10 Thread Torsten Duwe
Check for compiler support of -fpatchable-function-entry and use it
to intercept functions immediately on entry, saving the LR in x9.
Disable ftracing in efi/libstub, because this triggers cross-section
linker errors now (-pg used to be disabled already).
Add an ftrace_caller which can handle LR in x9.
This code has successfully passed the FTRACE_STARTUP_TEST.

Signed-off-by: Torsten Duwe 

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -110,6 +110,7 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -78,6 +78,15 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
 KBUILD_LDFLAGS_MODULE  += -T $(srctree)/arch/arm64/kernel/module.lds
 endif
 
+ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+  KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
+  ifeq ($(call cc-option,-fpatchable-function-entry=2),)
+$(warning Cannot use CONFIG_DYNAMIC_FTRACE_WITH_REGS: \
+ -fpatchable-function-entry not supported by compiler)
+  endif
+endif
+
 # Default value
 head-y := arch/arm64/kernel/head.o
 
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -16,6 +16,14 @@
 #define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#define REC_IP_BRANCH_OFFSET 4
+#define FTRACE_REGS_ADDR FTRACE_ADDR
+#else
+#define REC_IP_BRANCH_OFFSET 0
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$(
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
 
-CFLAGS_REMOVE_ftrace.o = -pg
-CFLAGS_REMOVE_insn.o = -pg
-CFLAGS_REMOVE_return_address.o = -pg
+CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE)
 
 # Object file lists.
 arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o  
\
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -11,7 +11,8 @@
   -fPIC -fno-strict-aliasing -mno-red-zone \
   -mno-mmx -mno-sse -fshort-wchar
 
-cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie
+cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\
+ ,$(KBUILD_CFLAGS)) -fpie
 cflags-$(CONFIG_ARM)   := $(subst -pg,,$(KBUILD_CFLAGS)) \
   -fno-builtin -fpic -mno-single-pic-base
 
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Gcc with -pg will put the following code in the beginning of each function:
@@ -123,6 +125,7 @@ skip_ftrace_call:   // }
 ENDPROC(_mcount)
 
 #else /* CONFIG_DYNAMIC_FTRACE */
+#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 /*
  * _mcount() is used to build the kernel with -pg option, but all the branch
  * instructions to _mcount() are replaced to NOP initially at kernel start up,
@@ -162,6 +165,84 @@ ftrace_graph_call: // ftrace_graph_cal
 
mcount_exit
 ENDPROC(ftrace_caller)
+#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
+ENTRY(_mcount)
+   mov x10, lr
+   mov lr, x9
+   ret x10
+ENDPROC(_mcount)
+
+ENTRY(ftrace_caller)
+   stp x29, x9, [sp, #-16]!
+   sub sp, sp, #S_FRAME_SIZE
+
+   stp x0, x1, [sp]
+   stp x2, x3, [sp, #16]
+   stp x4, x5, [sp, #32]
+   stp x6, x7, [sp, #48]
+   stp x8, x9, [sp, #64]
+   stp x10, x11, [sp, #80]
+   stp x12, x13, [sp, #96]
+   stp x14, x15, [sp, #112]
+   stp x16, x17, [sp, #128]
+   stp x18, x19, [sp, #144]
+   stp x20, x21, [sp, #160]
+   stp x22, x23, [sp, #176]
+   stp x24, x25, [sp, #192]
+   stp x26, x27, [sp, #208]
+   stp x28, x29, [sp, #224]
+   /* The link Register at callee entry */
+   str x9, [sp, #S_LR]
+   /* The program counter just after the ftrace call site */
+   str lr, [sp, #S_PC]
+   /* The stack pointer as it was on ftrace_caller entry... */
+   add x29, sp, #S_FRAME_SIZE+16   /* ...is also our new FP */
+   str x29, [sp, #S_SP]
+
+   adrpx0, function_trace_op
+   ldr x2, [x0, #:lo12:function_trace_op]
+   mov x1, x9  /* saved LR == parent IP */
+   sub x0, lr, #8  /* function entry == IP

  1   2   3   4   5   6   7   >