subject:"Re\: \[POC\]\[RFC\]\[PATCH 1\/2\] jump_function\: Addition of new feature \"jump

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-11 Thread Andy Lutomirski




> On Oct 11, 2018, at 5:52 AM, Josh Poimboeuf  wrote:
> 
>> On Wed, Oct 10, 2018 at 10:07:38PM -0500, Josh Poimboeuf wrote:
>>> On Wed, Oct 10, 2018 at 02:13:22PM -0700, Andy Lutomirski wrote:
 On Wed, Oct 10, 2018 at 11:17 AM Josh Poimboeuf  
 wrote:
 
> On Wed, Oct 10, 2018 at 01:16:05PM -0500, Josh Poimboeuf wrote:
> On Wed, Oct 10, 2018 at 11:03:43AM -0700, Andy Lutomirski wrote:
>>> +#define DECLARE_STATIC_CALL(tramp, func)   
>>> \
>>> +   extern typeof(func) tramp;  
>>> \
>>> +   static void __used __section(.discard.static_call_tramps)   
>>> \
>>> +   *__static_call_tramp_##tramp = tramp
>>> +
>> 
>> Confused.  What's the __static_call_tramp_##tramp variable for?  And
>> why is a DECLARE_ macro defining a variable?
> 
> This is the magic needed for objtool to find all the call sites.
> 
> The variable itself isn't needed, but the .discard.static_call_tramps
> entry is.  Objtool reads that section to find out which function call
> sites are targeted to a static call trampoline.
 
 To clarify: objtool reads that section to find out which functions are
 really static call trampolines.  Then it annotates all the instructions
 which call/jmp to those trampolines.  Those annotations are then read by
 the kernel.
 
>>> 
>>> Ah, right, and objtool runs on a per-object basis so it has no other
>>> way to know what symbols are actually static calls.
>>> 
>>> There's another way to skin this cat, though:
>>> 
>>> extern typeof(func) __static_call_trampoline_##tramp;
>>> #define tramp __static_call_trampoline_##tramp
>>> 
>>> And objtool could recognize it by name.  But, of course, you can't put
>>> a #define in a macro.  But maybe there's a way to hack it up with a
>>> static inline?
> 
> I went with something similar in the latest version.  It's less
> surprising in a couple of ways:
> 
> - DECLARE_STATIC_CALL doesn't have the magic objtool definition.
> 
> - Call sites use the static_call() wrapper, which makes static calls
>  clearly visible.

Seems reasonable. Also, for a real patch, it should be straightforward to have 
a fallback implementation in include/linux/static_call.h that just dereferences 
the pointer.

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-11 Thread Andy Lutomirski




> On Oct 11, 2018, at 5:52 AM, Josh Poimboeuf  wrote:
> 
>> On Wed, Oct 10, 2018 at 10:07:38PM -0500, Josh Poimboeuf wrote:
>>> On Wed, Oct 10, 2018 at 02:13:22PM -0700, Andy Lutomirski wrote:
 On Wed, Oct 10, 2018 at 11:17 AM Josh Poimboeuf  
 wrote:
 
> On Wed, Oct 10, 2018 at 01:16:05PM -0500, Josh Poimboeuf wrote:
> On Wed, Oct 10, 2018 at 11:03:43AM -0700, Andy Lutomirski wrote:
>>> +#define DECLARE_STATIC_CALL(tramp, func)   
>>> \
>>> +   extern typeof(func) tramp;  
>>> \
>>> +   static void __used __section(.discard.static_call_tramps)   
>>> \
>>> +   *__static_call_tramp_##tramp = tramp
>>> +
>> 
>> Confused.  What's the __static_call_tramp_##tramp variable for?  And
>> why is a DECLARE_ macro defining a variable?
> 
> This is the magic needed for objtool to find all the call sites.
> 
> The variable itself isn't needed, but the .discard.static_call_tramps
> entry is.  Objtool reads that section to find out which function call
> sites are targeted to a static call trampoline.
 
 To clarify: objtool reads that section to find out which functions are
 really static call trampolines.  Then it annotates all the instructions
 which call/jmp to those trampolines.  Those annotations are then read by
 the kernel.
 
>>> 
>>> Ah, right, and objtool runs on a per-object basis so it has no other
>>> way to know what symbols are actually static calls.
>>> 
>>> There's another way to skin this cat, though:
>>> 
>>> extern typeof(func) __static_call_trampoline_##tramp;
>>> #define tramp __static_call_trampoline_##tramp
>>> 
>>> And objtool could recognize it by name.  But, of course, you can't put
>>> a #define in a macro.  But maybe there's a way to hack it up with a
>>> static inline?
> 
> I went with something similar in the latest version.  It's less
> surprising in a couple of ways:
> 
> - DECLARE_STATIC_CALL doesn't have the magic objtool definition.
> 
> - Call sites use the static_call() wrapper, which makes static calls
>  clearly visible.

Seems reasonable. Also, for a real patch, it should be straightforward to have 
a fallback implementation in include/linux/static_call.h that just dereferences 
the pointer.

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-11 Thread Josh Poimboeuf

On Wed, Oct 10, 2018 at 10:07:38PM -0500, Josh Poimboeuf wrote:
> On Wed, Oct 10, 2018 at 02:13:22PM -0700, Andy Lutomirski wrote:
> > On Wed, Oct 10, 2018 at 11:17 AM Josh Poimboeuf  wrote:
> > >
> > > On Wed, Oct 10, 2018 at 01:16:05PM -0500, Josh Poimboeuf wrote:
> > > > On Wed, Oct 10, 2018 at 11:03:43AM -0700, Andy Lutomirski wrote:
> > > > > > +#define DECLARE_STATIC_CALL(tramp, func)   
> > > > > > \
> > > > > > +   extern typeof(func) tramp;  
> > > > > > \
> > > > > > +   static void __used __section(.discard.static_call_tramps)   
> > > > > > \
> > > > > > +   *__static_call_tramp_##tramp = tramp
> > > > > > +
> > > > >
> > > > > Confused.  What's the __static_call_tramp_##tramp variable for?  And
> > > > > why is a DECLARE_ macro defining a variable?
> > > >
> > > > This is the magic needed for objtool to find all the call sites.
> > > >
> > > > The variable itself isn't needed, but the .discard.static_call_tramps
> > > > entry is.  Objtool reads that section to find out which function call
> > > > sites are targeted to a static call trampoline.
> > >
> > > To clarify: objtool reads that section to find out which functions are
> > > really static call trampolines.  Then it annotates all the instructions
> > > which call/jmp to those trampolines.  Those annotations are then read by
> > > the kernel.
> > >
> > 
> > Ah, right, and objtool runs on a per-object basis so it has no other
> > way to know what symbols are actually static calls.
> > 
> > There's another way to skin this cat, though:
> > 
> > extern typeof(func) __static_call_trampoline_##tramp;
> > #define tramp __static_call_trampoline_##tramp
> > 
> > And objtool could recognize it by name.  But, of course, you can't put
> > a #define in a macro.  But maybe there's a way to hack it up with a
> > static inline?

I went with something similar in the latest version.  It's less
surprising in a couple of ways:

- DECLARE_STATIC_CALL doesn't have the magic objtool definition.

- Call sites use the static_call() wrapper, which makes static calls
  clearly visible.


#define STATIC_CALL_TRAMP(key) static_call_tramp_##key
#define STATIC_CALL_PTR(key)   static_call_ptr_##key

#define STATIC_CALL_TRAMP_STR(key) __stringify(STATIC_CALL_TRAMP(key))
#define STATIC_CALL_PTR_STR(key) __stringify(STATIC_CALL_PTR(key))

#define DECLARE_STATIC_CALL(key, func)  \
extern typeof(func)  STATIC_CALL_TRAMP(key);\
extern typeof(func) *STATIC_CALL_PTR(key)

#define DEFINE_STATIC_CALL(key, func)   \
typeof(func) *STATIC_CALL_PTR(key) = func;  \
asm(".pushsection .text, \"ax\" \n" \
".align 4   \n" \
".globl " STATIC_CALL_TRAMP_STR(key) "  \n" \
".type " STATIC_CALL_TRAMP_STR(key) ", @function\n" \
STATIC_CALL_TRAMP_STR(key) ":   \n" \
"jmpq *" STATIC_CALL_PTR_STR(key) "(%rip)   \n" \
".popsection\n")

#define static_call(key, args...) STATIC_CALL_TRAMP(key)(args)

#define static_call_update(key, func)   \
({  \
STATIC_CALL_PTR(key) = func;\
__static_call_update((void **)_CALL_PTR(key));   \
})

-- 
Josh

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-11 Thread Josh Poimboeuf

On Wed, Oct 10, 2018 at 10:07:38PM -0500, Josh Poimboeuf wrote:
> On Wed, Oct 10, 2018 at 02:13:22PM -0700, Andy Lutomirski wrote:
> > On Wed, Oct 10, 2018 at 11:17 AM Josh Poimboeuf  wrote:
> > >
> > > On Wed, Oct 10, 2018 at 01:16:05PM -0500, Josh Poimboeuf wrote:
> > > > On Wed, Oct 10, 2018 at 11:03:43AM -0700, Andy Lutomirski wrote:
> > > > > > +#define DECLARE_STATIC_CALL(tramp, func)   
> > > > > > \
> > > > > > +   extern typeof(func) tramp;  
> > > > > > \
> > > > > > +   static void __used __section(.discard.static_call_tramps)   
> > > > > > \
> > > > > > +   *__static_call_tramp_##tramp = tramp
> > > > > > +
> > > > >
> > > > > Confused.  What's the __static_call_tramp_##tramp variable for?  And
> > > > > why is a DECLARE_ macro defining a variable?
> > > >
> > > > This is the magic needed for objtool to find all the call sites.
> > > >
> > > > The variable itself isn't needed, but the .discard.static_call_tramps
> > > > entry is.  Objtool reads that section to find out which function call
> > > > sites are targeted to a static call trampoline.
> > >
> > > To clarify: objtool reads that section to find out which functions are
> > > really static call trampolines.  Then it annotates all the instructions
> > > which call/jmp to those trampolines.  Those annotations are then read by
> > > the kernel.
> > >
> > 
> > Ah, right, and objtool runs on a per-object basis so it has no other
> > way to know what symbols are actually static calls.
> > 
> > There's another way to skin this cat, though:
> > 
> > extern typeof(func) __static_call_trampoline_##tramp;
> > #define tramp __static_call_trampoline_##tramp
> > 
> > And objtool could recognize it by name.  But, of course, you can't put
> > a #define in a macro.  But maybe there's a way to hack it up with a
> > static inline?

I went with something similar in the latest version.  It's less
surprising in a couple of ways:

- DECLARE_STATIC_CALL doesn't have the magic objtool definition.

- Call sites use the static_call() wrapper, which makes static calls
  clearly visible.


#define STATIC_CALL_TRAMP(key) static_call_tramp_##key
#define STATIC_CALL_PTR(key)   static_call_ptr_##key

#define STATIC_CALL_TRAMP_STR(key) __stringify(STATIC_CALL_TRAMP(key))
#define STATIC_CALL_PTR_STR(key) __stringify(STATIC_CALL_PTR(key))

#define DECLARE_STATIC_CALL(key, func)  \
extern typeof(func)  STATIC_CALL_TRAMP(key);\
extern typeof(func) *STATIC_CALL_PTR(key)

#define DEFINE_STATIC_CALL(key, func)   \
typeof(func) *STATIC_CALL_PTR(key) = func;  \
asm(".pushsection .text, \"ax\" \n" \
".align 4   \n" \
".globl " STATIC_CALL_TRAMP_STR(key) "  \n" \
".type " STATIC_CALL_TRAMP_STR(key) ", @function\n" \
STATIC_CALL_TRAMP_STR(key) ":   \n" \
"jmpq *" STATIC_CALL_PTR_STR(key) "(%rip)   \n" \
".popsection\n")

#define static_call(key, args...) STATIC_CALL_TRAMP(key)(args)

#define static_call_update(key, func)   \
({  \
STATIC_CALL_PTR(key) = func;\
__static_call_update((void **)_CALL_PTR(key));   \
})

-- 
Josh

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Josh Poimboeuf

On Wed, Oct 10, 2018 at 02:13:22PM -0700, Andy Lutomirski wrote:
> On Wed, Oct 10, 2018 at 11:17 AM Josh Poimboeuf  wrote:
> >
> > On Wed, Oct 10, 2018 at 01:16:05PM -0500, Josh Poimboeuf wrote:
> > > On Wed, Oct 10, 2018 at 11:03:43AM -0700, Andy Lutomirski wrote:
> > > > > +#define DECLARE_STATIC_CALL(tramp, func) 
> > > > >   \
> > > > > +   extern typeof(func) tramp;
> > > > >   \
> > > > > +   static void __used __section(.discard.static_call_tramps) 
> > > > >   \
> > > > > +   *__static_call_tramp_##tramp = tramp
> > > > > +
> > > >
> > > > Confused.  What's the __static_call_tramp_##tramp variable for?  And
> > > > why is a DECLARE_ macro defining a variable?
> > >
> > > This is the magic needed for objtool to find all the call sites.
> > >
> > > The variable itself isn't needed, but the .discard.static_call_tramps
> > > entry is.  Objtool reads that section to find out which function call
> > > sites are targeted to a static call trampoline.
> >
> > To clarify: objtool reads that section to find out which functions are
> > really static call trampolines.  Then it annotates all the instructions
> > which call/jmp to those trampolines.  Those annotations are then read by
> > the kernel.
> >
> 
> Ah, right, and objtool runs on a per-object basis so it has no other
> way to know what symbols are actually static calls.
> 
> There's another way to skin this cat, though:
> 
> extern typeof(func) __static_call_trampoline_##tramp;
> #define tramp __static_call_trampoline_##tramp
> 
> And objtool could recognize it by name.  But, of course, you can't put
> a #define in a macro.  But maybe there's a way to hack it up with a
> static inline?

Not sure how to do that...

> Anyway, your way is probably fine with a few caveats:
> 
>  - It won't really work if the call comes from a .S file.

Maybe we can have similar macros for asm code.

>  - There should probably be a comment to help de-confuse future people
> like me :)

Done.  I also converted the trampoline to use an indirect jump.  The
latest code is below.  I'm going off the grid this weekend but I can
probably post proper patches next week.

diff --git a/arch/Kconfig b/arch/Kconfig
index 9d329608913e..20ff5624dad7 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -865,6 +865,9 @@ config HAVE_ARCH_PREL32_RELOCATIONS
  architectures, and don't require runtime relocation on relocatable
  kernels.
 
+config HAVE_ARCH_STATIC_CALL
+   bool
+
 source "kernel/gcov/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5136a1281870..1a14c8f87876 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -128,6 +128,7 @@ config X86
select HAVE_ARCH_COMPAT_MMAP_BASES  if MMU && COMPAT
select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
+   select HAVE_ARCH_STATIC_CALLif X86_64
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/x86/include/asm/static_call.h 
b/arch/x86/include/asm/static_call.h
new file mode 100644
index ..50006bcc3352
--- /dev/null
+++ b/arch/x86/include/asm/static_call.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_STATIC_CALL_H
+#define _ASM_STATIC_CALL_H
+
+#ifdef CONFIG_X86_64
+
+#include 
+
+void static_call_init(void);
+extern void __static_call_update(void **func_ptr, void *func);
+
+#define STATIC_CALL_PTR(tramp) (__static_call_ptr_##tramp)
+
+/*
+ * In addition to declaring external variables, this macro also spits out some
+ * magic for objtool to read.  The .discard.static_call_tramps section tells
+ * objtool which functions are static call trampolines, so it can then annotate
+ * calls to those functions in the __static_call_table section.
+ */
+#define DECLARE_STATIC_CALL(tramp, func)   \
+   extern typeof(func) tramp;  \
+   extern typeof(func) *STATIC_CALL_PTR(tramp);\
+   static void __used __section(.discard.static_call_tramps)   \
+   *__static_call_tramp_##tramp = tramp
+
+#define DEFINE_STATIC_CALL(tramp, func)
\
+   DECLARE_STATIC_CALL(tramp, func);   \
+   typeof(func) *STATIC_CALL_PTR(tramp) = func;\
+   asm(".pushsection .text, \"ax\" \n" \
+   ".align 4   \n" \
+   ".globl " #tramp "  \n" \
+   ".type " #tramp ", @function\n" \
+   #tramp ":   \n" \
+   "jmpq *" __stringify(STATIC_CALL_PTR(tramp)) "(%rip)\n" \
+   ".popsection

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Josh Poimboeuf

On Wed, Oct 10, 2018 at 02:13:22PM -0700, Andy Lutomirski wrote:
> On Wed, Oct 10, 2018 at 11:17 AM Josh Poimboeuf  wrote:
> >
> > On Wed, Oct 10, 2018 at 01:16:05PM -0500, Josh Poimboeuf wrote:
> > > On Wed, Oct 10, 2018 at 11:03:43AM -0700, Andy Lutomirski wrote:
> > > > > +#define DECLARE_STATIC_CALL(tramp, func) 
> > > > >   \
> > > > > +   extern typeof(func) tramp;
> > > > >   \
> > > > > +   static void __used __section(.discard.static_call_tramps) 
> > > > >   \
> > > > > +   *__static_call_tramp_##tramp = tramp
> > > > > +
> > > >
> > > > Confused.  What's the __static_call_tramp_##tramp variable for?  And
> > > > why is a DECLARE_ macro defining a variable?
> > >
> > > This is the magic needed for objtool to find all the call sites.
> > >
> > > The variable itself isn't needed, but the .discard.static_call_tramps
> > > entry is.  Objtool reads that section to find out which function call
> > > sites are targeted to a static call trampoline.
> >
> > To clarify: objtool reads that section to find out which functions are
> > really static call trampolines.  Then it annotates all the instructions
> > which call/jmp to those trampolines.  Those annotations are then read by
> > the kernel.
> >
> 
> Ah, right, and objtool runs on a per-object basis so it has no other
> way to know what symbols are actually static calls.
> 
> There's another way to skin this cat, though:
> 
> extern typeof(func) __static_call_trampoline_##tramp;
> #define tramp __static_call_trampoline_##tramp
> 
> And objtool could recognize it by name.  But, of course, you can't put
> a #define in a macro.  But maybe there's a way to hack it up with a
> static inline?

Not sure how to do that...

> Anyway, your way is probably fine with a few caveats:
> 
>  - It won't really work if the call comes from a .S file.

Maybe we can have similar macros for asm code.

>  - There should probably be a comment to help de-confuse future people
> like me :)

Done.  I also converted the trampoline to use an indirect jump.  The
latest code is below.  I'm going off the grid this weekend but I can
probably post proper patches next week.

diff --git a/arch/Kconfig b/arch/Kconfig
index 9d329608913e..20ff5624dad7 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -865,6 +865,9 @@ config HAVE_ARCH_PREL32_RELOCATIONS
  architectures, and don't require runtime relocation on relocatable
  kernels.
 
+config HAVE_ARCH_STATIC_CALL
+   bool
+
 source "kernel/gcov/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5136a1281870..1a14c8f87876 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -128,6 +128,7 @@ config X86
select HAVE_ARCH_COMPAT_MMAP_BASES  if MMU && COMPAT
select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
+   select HAVE_ARCH_STATIC_CALLif X86_64
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/x86/include/asm/static_call.h 
b/arch/x86/include/asm/static_call.h
new file mode 100644
index ..50006bcc3352
--- /dev/null
+++ b/arch/x86/include/asm/static_call.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_STATIC_CALL_H
+#define _ASM_STATIC_CALL_H
+
+#ifdef CONFIG_X86_64
+
+#include 
+
+void static_call_init(void);
+extern void __static_call_update(void **func_ptr, void *func);
+
+#define STATIC_CALL_PTR(tramp) (__static_call_ptr_##tramp)
+
+/*
+ * In addition to declaring external variables, this macro also spits out some
+ * magic for objtool to read.  The .discard.static_call_tramps section tells
+ * objtool which functions are static call trampolines, so it can then annotate
+ * calls to those functions in the __static_call_table section.
+ */
+#define DECLARE_STATIC_CALL(tramp, func)   \
+   extern typeof(func) tramp;  \
+   extern typeof(func) *STATIC_CALL_PTR(tramp);\
+   static void __used __section(.discard.static_call_tramps)   \
+   *__static_call_tramp_##tramp = tramp
+
+#define DEFINE_STATIC_CALL(tramp, func)
\
+   DECLARE_STATIC_CALL(tramp, func);   \
+   typeof(func) *STATIC_CALL_PTR(tramp) = func;\
+   asm(".pushsection .text, \"ax\" \n" \
+   ".align 4   \n" \
+   ".globl " #tramp "  \n" \
+   ".type " #tramp ", @function\n" \
+   #tramp ":   \n" \
+   "jmpq *" __stringify(STATIC_CALL_PTR(tramp)) "(%rip)\n" \
+   ".popsection

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Andy Lutomirski

On Wed, Oct 10, 2018 at 11:17 AM Josh Poimboeuf  wrote:
>
> On Wed, Oct 10, 2018 at 01:16:05PM -0500, Josh Poimboeuf wrote:
> > On Wed, Oct 10, 2018 at 11:03:43AM -0700, Andy Lutomirski wrote:
> > > > +#define DECLARE_STATIC_CALL(tramp, func)   
> > > > \
> > > > +   extern typeof(func) tramp;  
> > > > \
> > > > +   static void __used __section(.discard.static_call_tramps)   
> > > > \
> > > > +   *__static_call_tramp_##tramp = tramp
> > > > +
> > >
> > > Confused.  What's the __static_call_tramp_##tramp variable for?  And
> > > why is a DECLARE_ macro defining a variable?
> >
> > This is the magic needed for objtool to find all the call sites.
> >
> > The variable itself isn't needed, but the .discard.static_call_tramps
> > entry is.  Objtool reads that section to find out which function call
> > sites are targeted to a static call trampoline.
>
> To clarify: objtool reads that section to find out which functions are
> really static call trampolines.  Then it annotates all the instructions
> which call/jmp to those trampolines.  Those annotations are then read by
> the kernel.
>

Ah, right, and objtool runs on a per-object basis so it has no other
way to know what symbols are actually static calls.

There's another way to skin this cat, though:

extern typeof(func) __static_call_trampoline_##tramp;
#define tramp __static_call_trampoline_##tramp

And objtool could recognize it by name.  But, of course, you can't put
a #define in a macro.  But maybe there's a way to hack it up with a
static inline?

Anyway, your way is probably fine with a few caveats:

 - It won't really work if the call comes from a .S file.
 - There should probably be a comment to help de-confuse future people
like me :)

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Andy Lutomirski

On Wed, Oct 10, 2018 at 11:17 AM Josh Poimboeuf  wrote:
>
> On Wed, Oct 10, 2018 at 01:16:05PM -0500, Josh Poimboeuf wrote:
> > On Wed, Oct 10, 2018 at 11:03:43AM -0700, Andy Lutomirski wrote:
> > > > +#define DECLARE_STATIC_CALL(tramp, func)   
> > > > \
> > > > +   extern typeof(func) tramp;  
> > > > \
> > > > +   static void __used __section(.discard.static_call_tramps)   
> > > > \
> > > > +   *__static_call_tramp_##tramp = tramp
> > > > +
> > >
> > > Confused.  What's the __static_call_tramp_##tramp variable for?  And
> > > why is a DECLARE_ macro defining a variable?
> >
> > This is the magic needed for objtool to find all the call sites.
> >
> > The variable itself isn't needed, but the .discard.static_call_tramps
> > entry is.  Objtool reads that section to find out which function call
> > sites are targeted to a static call trampoline.
>
> To clarify: objtool reads that section to find out which functions are
> really static call trampolines.  Then it annotates all the instructions
> which call/jmp to those trampolines.  Those annotations are then read by
> the kernel.
>

Ah, right, and objtool runs on a per-object basis so it has no other
way to know what symbols are actually static calls.

There's another way to skin this cat, though:

extern typeof(func) __static_call_trampoline_##tramp;
#define tramp __static_call_trampoline_##tramp

And objtool could recognize it by name.  But, of course, you can't put
a #define in a macro.  But maybe there's a way to hack it up with a
static inline?

Anyway, your way is probably fine with a few caveats:

 - It won't really work if the call comes from a .S file.
 - There should probably be a comment to help de-confuse future people
like me :)

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Andy Lutomirski

On Wed, Oct 10, 2018 at 1:16 PM Josh Poimboeuf  wrote:
>
> On Wed, Oct 10, 2018 at 02:56:08PM -0400, Steven Rostedt wrote:
> > On Wed, 10 Oct 2018 13:33:30 -0500
> > Josh Poimboeuf  wrote:
> >
> > > Re-reading your suggestion, I may have misunderstood what you're
> > > suggesting here, but I'm thinking about doing something like what you
> > > proposed earlier:
> > >
> > > GLOBAL(tramp)
> > >   jmp *current_func(%rip)
> > > ENDPROC(tramp)
> > >
> > > That is, doing an indirect jump instead of the above direct jump, so
> > > that any previous references to the trampoline would still work (and it
> > > would also work during early boot).
> > >
> > > Though it should probably be a retpoline instead of an indirect jump.
> >
> > But do we care, as it only takes place during text_poke_bp() right?
> >
> > I don't think we need to worry about training trampoline branch
> > prediction that can only be hit when something enables the jump.
>
> Yeah, I guess it depends on if we'd expect anybody (or gcc) to get a
> function pointer to the trampoline itself.  I can just create a warning
> for that in objtool.
>

The jmp * in the trampoline itself is harmless even with Spectre
because it won't ever execute -- static_call_init() should just patch
it out even if the actual call target is never updated.  And gcc has
no business generating any unprotected indirect branches to it from
anywhere else, since, as far as gcc is concerned, they're just like
indirect branches to any other function.

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Andy Lutomirski

On Wed, Oct 10, 2018 at 1:16 PM Josh Poimboeuf  wrote:
>
> On Wed, Oct 10, 2018 at 02:56:08PM -0400, Steven Rostedt wrote:
> > On Wed, 10 Oct 2018 13:33:30 -0500
> > Josh Poimboeuf  wrote:
> >
> > > Re-reading your suggestion, I may have misunderstood what you're
> > > suggesting here, but I'm thinking about doing something like what you
> > > proposed earlier:
> > >
> > > GLOBAL(tramp)
> > >   jmp *current_func(%rip)
> > > ENDPROC(tramp)
> > >
> > > That is, doing an indirect jump instead of the above direct jump, so
> > > that any previous references to the trampoline would still work (and it
> > > would also work during early boot).
> > >
> > > Though it should probably be a retpoline instead of an indirect jump.
> >
> > But do we care, as it only takes place during text_poke_bp() right?
> >
> > I don't think we need to worry about training trampoline branch
> > prediction that can only be hit when something enables the jump.
>
> Yeah, I guess it depends on if we'd expect anybody (or gcc) to get a
> function pointer to the trampoline itself.  I can just create a warning
> for that in objtool.
>

The jmp * in the trampoline itself is harmless even with Spectre
because it won't ever execute -- static_call_init() should just patch
it out even if the actual call target is never updated.  And gcc has
no business generating any unprotected indirect branches to it from
anywhere else, since, as far as gcc is concerned, they're just like
indirect branches to any other function.

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Josh Poimboeuf

On Wed, Oct 10, 2018 at 02:56:08PM -0400, Steven Rostedt wrote:
> On Wed, 10 Oct 2018 13:33:30 -0500
> Josh Poimboeuf  wrote:
> 
> > Re-reading your suggestion, I may have misunderstood what you're
> > suggesting here, but I'm thinking about doing something like what you
> > proposed earlier:
> > 
> > GLOBAL(tramp)
> >   jmp *current_func(%rip)
> > ENDPROC(tramp)
> > 
> > That is, doing an indirect jump instead of the above direct jump, so
> > that any previous references to the trampoline would still work (and it
> > would also work during early boot).
> > 
> > Though it should probably be a retpoline instead of an indirect jump.
> 
> But do we care, as it only takes place during text_poke_bp() right?
> 
> I don't think we need to worry about training trampoline branch
> prediction that can only be hit when something enables the jump.

Yeah, I guess it depends on if we'd expect anybody (or gcc) to get a
function pointer to the trampoline itself.  I can just create a warning
for that in objtool.

-- 
Josh

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Josh Poimboeuf

On Wed, Oct 10, 2018 at 02:56:08PM -0400, Steven Rostedt wrote:
> On Wed, 10 Oct 2018 13:33:30 -0500
> Josh Poimboeuf  wrote:
> 
> > Re-reading your suggestion, I may have misunderstood what you're
> > suggesting here, but I'm thinking about doing something like what you
> > proposed earlier:
> > 
> > GLOBAL(tramp)
> >   jmp *current_func(%rip)
> > ENDPROC(tramp)
> > 
> > That is, doing an indirect jump instead of the above direct jump, so
> > that any previous references to the trampoline would still work (and it
> > would also work during early boot).
> > 
> > Though it should probably be a retpoline instead of an indirect jump.
> 
> But do we care, as it only takes place during text_poke_bp() right?
> 
> I don't think we need to worry about training trampoline branch
> prediction that can only be hit when something enables the jump.

Yeah, I guess it depends on if we'd expect anybody (or gcc) to get a
function pointer to the trampoline itself.  I can just create a warning
for that in objtool.

-- 
Josh

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Steven Rostedt

On Wed, 10 Oct 2018 13:33:30 -0500
Josh Poimboeuf  wrote:

> Re-reading your suggestion, I may have misunderstood what you're
> suggesting here, but I'm thinking about doing something like what you
> proposed earlier:
> 
> GLOBAL(tramp)
>   jmp *current_func(%rip)
> ENDPROC(tramp)
> 
> That is, doing an indirect jump instead of the above direct jump, so
> that any previous references to the trampoline would still work (and it
> would also work during early boot).
> 
> Though it should probably be a retpoline instead of an indirect jump.

But do we care, as it only takes place during text_poke_bp() right?

I don't think we need to worry about training trampoline branch
prediction that can only be hit when something enables the jump.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Steven Rostedt

On Wed, 10 Oct 2018 13:33:30 -0500
Josh Poimboeuf  wrote:

> Re-reading your suggestion, I may have misunderstood what you're
> suggesting here, but I'm thinking about doing something like what you
> proposed earlier:
> 
> GLOBAL(tramp)
>   jmp *current_func(%rip)
> ENDPROC(tramp)
> 
> That is, doing an indirect jump instead of the above direct jump, so
> that any previous references to the trampoline would still work (and it
> would also work during early boot).
> 
> Though it should probably be a retpoline instead of an indirect jump.

But do we care, as it only takes place during text_poke_bp() right?

I don't think we need to worry about training trampoline branch
prediction that can only be hit when something enables the jump.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Josh Poimboeuf

On Wed, Oct 10, 2018 at 01:16:05PM -0500, Josh Poimboeuf wrote:
> > > +#define DEFINE_STATIC_CALL(tramp, func)  
> > >   \
> > > +   DECLARE_STATIC_CALL(tramp, func);   \
> > > +   asm(".pushsection .text, \"ax\" \n" \
> > > +   ".align 4   \n" \
> > > +   ".globl " #tramp "  \n" \
> > > +   ".type " #tramp ", @function\n" \
> > > +   #tramp ":   \n" \
> > > +   "jmp " #func "  \n" \
> > 
> > I think this would be nicer as an indirect call that gets patched to a
> > direct call so that the update mechanism works even before it's
> > initialized.  (Currently static_branch blows up horribly if you try to
> > update one too early, and that's rather annoying IMO.)
> 
> Yeah, that would be better.  It would also allow trampoline function
> pointers to work, which I think you mentioned elsewhere.  And then I
> shouldn't trample this code in __static_call_update() -- that was
> already kind of nasty anyway.

Re-reading your suggestion, I may have misunderstood what you're
suggesting here, but I'm thinking about doing something like what you
proposed earlier:

GLOBAL(tramp)
  jmp *current_func(%rip)
ENDPROC(tramp)

That is, doing an indirect jump instead of the above direct jump, so
that any previous references to the trampoline would still work (and it
would also work during early boot).

Though it should probably be a retpoline instead of an indirect jump.

-- 
Josh

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Josh Poimboeuf

On Wed, Oct 10, 2018 at 01:16:05PM -0500, Josh Poimboeuf wrote:
> > > +#define DEFINE_STATIC_CALL(tramp, func)  
> > >   \
> > > +   DECLARE_STATIC_CALL(tramp, func);   \
> > > +   asm(".pushsection .text, \"ax\" \n" \
> > > +   ".align 4   \n" \
> > > +   ".globl " #tramp "  \n" \
> > > +   ".type " #tramp ", @function\n" \
> > > +   #tramp ":   \n" \
> > > +   "jmp " #func "  \n" \
> > 
> > I think this would be nicer as an indirect call that gets patched to a
> > direct call so that the update mechanism works even before it's
> > initialized.  (Currently static_branch blows up horribly if you try to
> > update one too early, and that's rather annoying IMO.)
> 
> Yeah, that would be better.  It would also allow trampoline function
> pointers to work, which I think you mentioned elsewhere.  And then I
> shouldn't trample this code in __static_call_update() -- that was
> already kind of nasty anyway.

Re-reading your suggestion, I may have misunderstood what you're
suggesting here, but I'm thinking about doing something like what you
proposed earlier:

GLOBAL(tramp)
  jmp *current_func(%rip)
ENDPROC(tramp)

That is, doing an indirect jump instead of the above direct jump, so
that any previous references to the trampoline would still work (and it
would also work during early boot).

Though it should probably be a retpoline instead of an indirect jump.

-- 
Josh

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Josh Poimboeuf

On Wed, Oct 10, 2018 at 01:16:05PM -0500, Josh Poimboeuf wrote:
> On Wed, Oct 10, 2018 at 11:03:43AM -0700, Andy Lutomirski wrote:
> > > +#define DECLARE_STATIC_CALL(tramp, func)   \
> > > +   extern typeof(func) tramp;  \
> > > +   static void __used __section(.discard.static_call_tramps)   \
> > > +   *__static_call_tramp_##tramp = tramp
> > > +
> > 
> > Confused.  What's the __static_call_tramp_##tramp variable for?  And
> > why is a DECLARE_ macro defining a variable?
> 
> This is the magic needed for objtool to find all the call sites.
> 
> The variable itself isn't needed, but the .discard.static_call_tramps
> entry is.  Objtool reads that section to find out which function call
> sites are targeted to a static call trampoline.

To clarify: objtool reads that section to find out which functions are
really static call trampolines.  Then it annotates all the instructions
which call/jmp to those trampolines.  Those annotations are then read by
the kernel.

-- 
Josh

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Josh Poimboeuf

On Wed, Oct 10, 2018 at 01:16:05PM -0500, Josh Poimboeuf wrote:
> On Wed, Oct 10, 2018 at 11:03:43AM -0700, Andy Lutomirski wrote:
> > > +#define DECLARE_STATIC_CALL(tramp, func)   \
> > > +   extern typeof(func) tramp;  \
> > > +   static void __used __section(.discard.static_call_tramps)   \
> > > +   *__static_call_tramp_##tramp = tramp
> > > +
> > 
> > Confused.  What's the __static_call_tramp_##tramp variable for?  And
> > why is a DECLARE_ macro defining a variable?
> 
> This is the magic needed for objtool to find all the call sites.
> 
> The variable itself isn't needed, but the .discard.static_call_tramps
> entry is.  Objtool reads that section to find out which function call
> sites are targeted to a static call trampoline.

To clarify: objtool reads that section to find out which functions are
really static call trampolines.  Then it annotates all the instructions
which call/jmp to those trampolines.  Those annotations are then read by
the kernel.

-- 
Josh

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Josh Poimboeuf

On Wed, Oct 10, 2018 at 11:03:43AM -0700, Andy Lutomirski wrote:
> > +#define DECLARE_STATIC_CALL(tramp, func)   \
> > +   extern typeof(func) tramp;  \
> > +   static void __used __section(.discard.static_call_tramps)   \
> > +   *__static_call_tramp_##tramp = tramp
> > +
> 
> Confused.  What's the __static_call_tramp_##tramp variable for?  And
> why is a DECLARE_ macro defining a variable?

This is the magic needed for objtool to find all the call sites.

The variable itself isn't needed, but the .discard.static_call_tramps
entry is.  Objtool reads that section to find out which function call
sites are targeted to a static call trampoline.

> > +#define DEFINE_STATIC_CALL(tramp, func)
> > \
> > +   DECLARE_STATIC_CALL(tramp, func);   \
> > +   asm(".pushsection .text, \"ax\" \n" \
> > +   ".align 4   \n" \
> > +   ".globl " #tramp "  \n" \
> > +   ".type " #tramp ", @function\n" \
> > +   #tramp ":   \n" \
> > +   "jmp " #func "  \n" \
> 
> I think this would be nicer as an indirect call that gets patched to a
> direct call so that the update mechanism works even before it's
> initialized.  (Currently static_branch blows up horribly if you try to
> update one too early, and that's rather annoying IMO.)

Yeah, that would be better.  It would also allow trampoline function
pointers to work, which I think you mentioned elsewhere.  And then I
shouldn't trample this code in __static_call_update() -- that was
already kind of nasty anyway.

> Also, I think you're looking for "jmp.d32", which is available in
> newer toolchains.  For older toolchains, you could use .byte 0xe9 or
> you could use a different section (I think) to force a real 32-bit
> jump.

Good idea.

> > +void __init static_call_init(void)
> > +{
> > +   struct static_call_entry *entry;
> > +   unsigned long insn, tramp, func;
> > +   unsigned char insn_opcode, tramp_opcode;
> > +   s32 call_dest;
> > +
> > +   for (entry = __start_static_call_table;
> > +entry < __stop_static_call_table; entry++) {
> > +
> > +   insn = (long)entry->insn + (unsigned long)>insn;
> > +   tramp = (long)entry->tramp + (unsigned long)>tramp;
> > +
> > +   insn_opcode = *(unsigned char *)insn;
> > +   if (insn_opcode != 0xe8 && insn_opcode != 0xe9) {
> > +   WARN_ONCE(1, "unexpected static call insn opcode %x 
> > at %pS",
> > + insn_opcode, (void *)insn);
> > +   continue;
> > +   }
> > +
> > +   tramp_opcode = *(unsigned char *)tramp;
> > +   if (tramp_opcode != 0xeb && tramp_opcode != 0xe9) {
> > +   WARN_ONCE(1, "unexpected trampoline jump opcode %x 
> > at %ps",
> > +tramp_opcode, (void *)tramp);
> > +   continue;
> > +   }
> > +
> > +   if (tramp_opcode == 0xeb)
> > +   func = *(s8 *)(tramp + 1) + (tramp + 2);
> 
> I realize you expect some instances of 0xeb due to the assembler
> messing you up (see above), but this seems a bit too permissive, since
> a 0xeb without the appropriate set of NOPs is going to explode.  And:

Yep.

> > +   else
> > +   func = *(s32 *)(tramp + 1) + (tramp + 5);
> > +
> > +   call_dest = (long)(func) - (long)(insn + 5);
> > +
> > +   printk("static_call_init: poking %lx at %lx\n", (unsigned 
> > long)call_dest, (insn+1));
> > +
> > +   text_poke_early((void *)(insn + 1), _dest, 4);
> 
> If you really are going to rewrite an 8-bit jump to a 32-bit jump, I
> think you need to actually patch the opcode byte :)

Oops :-)

-- 
Josh

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Josh Poimboeuf

On Wed, Oct 10, 2018 at 11:03:43AM -0700, Andy Lutomirski wrote:
> > +#define DECLARE_STATIC_CALL(tramp, func)   \
> > +   extern typeof(func) tramp;  \
> > +   static void __used __section(.discard.static_call_tramps)   \
> > +   *__static_call_tramp_##tramp = tramp
> > +
> 
> Confused.  What's the __static_call_tramp_##tramp variable for?  And
> why is a DECLARE_ macro defining a variable?

This is the magic needed for objtool to find all the call sites.

The variable itself isn't needed, but the .discard.static_call_tramps
entry is.  Objtool reads that section to find out which function call
sites are targeted to a static call trampoline.

> > +#define DEFINE_STATIC_CALL(tramp, func)
> > \
> > +   DECLARE_STATIC_CALL(tramp, func);   \
> > +   asm(".pushsection .text, \"ax\" \n" \
> > +   ".align 4   \n" \
> > +   ".globl " #tramp "  \n" \
> > +   ".type " #tramp ", @function\n" \
> > +   #tramp ":   \n" \
> > +   "jmp " #func "  \n" \
> 
> I think this would be nicer as an indirect call that gets patched to a
> direct call so that the update mechanism works even before it's
> initialized.  (Currently static_branch blows up horribly if you try to
> update one too early, and that's rather annoying IMO.)

Yeah, that would be better.  It would also allow trampoline function
pointers to work, which I think you mentioned elsewhere.  And then I
shouldn't trample this code in __static_call_update() -- that was
already kind of nasty anyway.

> Also, I think you're looking for "jmp.d32", which is available in
> newer toolchains.  For older toolchains, you could use .byte 0xe9 or
> you could use a different section (I think) to force a real 32-bit
> jump.

Good idea.

> > +void __init static_call_init(void)
> > +{
> > +   struct static_call_entry *entry;
> > +   unsigned long insn, tramp, func;
> > +   unsigned char insn_opcode, tramp_opcode;
> > +   s32 call_dest;
> > +
> > +   for (entry = __start_static_call_table;
> > +entry < __stop_static_call_table; entry++) {
> > +
> > +   insn = (long)entry->insn + (unsigned long)>insn;
> > +   tramp = (long)entry->tramp + (unsigned long)>tramp;
> > +
> > +   insn_opcode = *(unsigned char *)insn;
> > +   if (insn_opcode != 0xe8 && insn_opcode != 0xe9) {
> > +   WARN_ONCE(1, "unexpected static call insn opcode %x 
> > at %pS",
> > + insn_opcode, (void *)insn);
> > +   continue;
> > +   }
> > +
> > +   tramp_opcode = *(unsigned char *)tramp;
> > +   if (tramp_opcode != 0xeb && tramp_opcode != 0xe9) {
> > +   WARN_ONCE(1, "unexpected trampoline jump opcode %x 
> > at %ps",
> > +tramp_opcode, (void *)tramp);
> > +   continue;
> > +   }
> > +
> > +   if (tramp_opcode == 0xeb)
> > +   func = *(s8 *)(tramp + 1) + (tramp + 2);
> 
> I realize you expect some instances of 0xeb due to the assembler
> messing you up (see above), but this seems a bit too permissive, since
> a 0xeb without the appropriate set of NOPs is going to explode.  And:

Yep.

> > +   else
> > +   func = *(s32 *)(tramp + 1) + (tramp + 5);
> > +
> > +   call_dest = (long)(func) - (long)(insn + 5);
> > +
> > +   printk("static_call_init: poking %lx at %lx\n", (unsigned 
> > long)call_dest, (insn+1));
> > +
> > +   text_poke_early((void *)(insn + 1), _dest, 4);
> 
> If you really are going to rewrite an 8-bit jump to a 32-bit jump, I
> think you need to actually patch the opcode byte :)

Oops :-)

-- 
Josh

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Andy Lutomirski

On Wed, Oct 10, 2018 at 10:52 AM Josh Poimboeuf  wrote:
>
> On Mon, Oct 08, 2018 at 11:57:50PM -0400, Steven Rostedt wrote:
> > On Mon, 8 Oct 2018 21:17:10 -0500
> > Josh Poimboeuf  wrote:
> >
> > > I'm not really convinced we need objtool for this, maybe I'll try
> > > whipping up a POC.
> >
> > Awesome!
> >
> > I wasn't thinking of actually having objtool itself perform this task,
> > but instead breaking the internals of objtool up into more of a generic
> > infrastructure, that recordmcount.c, objtool, and whatever this does
> > can use.
>
> So I had been thinking that we could find the call sites at runtime, by
> looking at the relocations.  But I managed to forget that vmlinux
> relocations are resolved during linking.  So yeah, some kind of tooling
> magic would be needed.
>
> I worked up a POC using objtool.  It doesn't *have* to be done with
> objtool, but since it's already reading/writing all the ELF stuff
> anyway, it was pretty easy to add this on.
>
> This patch has at least a few issues:
>
> - No module support.
>
> - For some reason, the sync_cores in text_poke_bp() don't always seem to
>   be working as expected.  Running this patch on my VM, the test code in
>   cmdline_proc_show() works *most* of the time, but it occasionally
>   branches off into the weeds.  I have no idea what the problem is yet.
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 9d329608913e..20ff5624dad7 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -865,6 +865,9 @@ config HAVE_ARCH_PREL32_RELOCATIONS
>   architectures, and don't require runtime relocation on relocatable
>   kernels.
>
> +config HAVE_ARCH_STATIC_CALL
> +   bool
> +
>  source "kernel/gcov/Kconfig"
>
>  source "scripts/gcc-plugins/Kconfig"
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5136a1281870..1a14c8f87876 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -128,6 +128,7 @@ config X86
> select HAVE_ARCH_COMPAT_MMAP_BASES  if MMU && COMPAT
> select HAVE_ARCH_PREL32_RELOCATIONS
> select HAVE_ARCH_SECCOMP_FILTER
> +   select HAVE_ARCH_STATIC_CALLif X86_64
> select HAVE_ARCH_THREAD_STRUCT_WHITELIST
> select HAVE_ARCH_TRACEHOOK
> select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> diff --git a/arch/x86/include/asm/static_call.h 
> b/arch/x86/include/asm/static_call.h
> new file mode 100644
> index ..40fec631b760
> --- /dev/null
> +++ b/arch/x86/include/asm/static_call.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_STATIC_CALL_H
> +#define _ASM_STATIC_CALL_H
> +
> +#ifdef CONFIG_X86_64
> +
> +#include 
> +
> +void static_call_init(void);
> +extern void __static_call_update(void *tramp, void *func);
> +
> +#define DECLARE_STATIC_CALL(tramp, func)   \
> +   extern typeof(func) tramp;  \
> +   static void __used __section(.discard.static_call_tramps)   \
> +   *__static_call_tramp_##tramp = tramp
> +

Confused.  What's the __static_call_tramp_##tramp variable for?  And
why is a DECLARE_ macro defining a variable?

> +#define DEFINE_STATIC_CALL(tramp, func)  
>   \
> +   DECLARE_STATIC_CALL(tramp, func);   \
> +   asm(".pushsection .text, \"ax\" \n" \
> +   ".align 4   \n" \
> +   ".globl " #tramp "  \n" \
> +   ".type " #tramp ", @function\n" \
> +   #tramp ":   \n" \
> +   "jmp " #func "  \n" \

I think this would be nicer as an indirect call that gets patched to a
direct call so that the update mechanism works even before it's
initialized.  (Currently static_branch blows up horribly if you try to
update one too early, and that's rather annoying IMO.)

Also, I think you're looking for "jmp.d32", which is available in
newer toolchains.  For older toolchains, you could use .byte 0xe9 or
you could use a different section (I think) to force a real 32-bit
jump.

> +void __init static_call_init(void)
> +{
> +   struct static_call_entry *entry;
> +   unsigned long insn, tramp, func;
> +   unsigned char insn_opcode, tramp_opcode;
> +   s32 call_dest;
> +
> +   for (entry = __start_static_call_table;
> +entry < __stop_static_call_table; entry++) {
> +
> +   insn = (long)entry->insn + (unsigned long)>insn;
> +   tramp = (long)entry->tramp + (unsigned long)>tramp;
> +
> +   insn_opcode = *(unsigned char *)insn;
> +   if (insn_opcode != 0xe8 && insn_opcode != 0xe9) {
> +   WARN_ONCE(1, "unexpected static call insn opcode %x 
> at %pS",
> + insn_opcode, (void *)insn);
> +

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Andy Lutomirski

On Wed, Oct 10, 2018 at 10:52 AM Josh Poimboeuf  wrote:
>
> On Mon, Oct 08, 2018 at 11:57:50PM -0400, Steven Rostedt wrote:
> > On Mon, 8 Oct 2018 21:17:10 -0500
> > Josh Poimboeuf  wrote:
> >
> > > I'm not really convinced we need objtool for this, maybe I'll try
> > > whipping up a POC.
> >
> > Awesome!
> >
> > I wasn't thinking of actually having objtool itself perform this task,
> > but instead breaking the internals of objtool up into more of a generic
> > infrastructure, that recordmcount.c, objtool, and whatever this does
> > can use.
>
> So I had been thinking that we could find the call sites at runtime, by
> looking at the relocations.  But I managed to forget that vmlinux
> relocations are resolved during linking.  So yeah, some kind of tooling
> magic would be needed.
>
> I worked up a POC using objtool.  It doesn't *have* to be done with
> objtool, but since it's already reading/writing all the ELF stuff
> anyway, it was pretty easy to add this on.
>
> This patch has at least a few issues:
>
> - No module support.
>
> - For some reason, the sync_cores in text_poke_bp() don't always seem to
>   be working as expected.  Running this patch on my VM, the test code in
>   cmdline_proc_show() works *most* of the time, but it occasionally
>   branches off into the weeds.  I have no idea what the problem is yet.
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 9d329608913e..20ff5624dad7 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -865,6 +865,9 @@ config HAVE_ARCH_PREL32_RELOCATIONS
>   architectures, and don't require runtime relocation on relocatable
>   kernels.
>
> +config HAVE_ARCH_STATIC_CALL
> +   bool
> +
>  source "kernel/gcov/Kconfig"
>
>  source "scripts/gcc-plugins/Kconfig"
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5136a1281870..1a14c8f87876 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -128,6 +128,7 @@ config X86
> select HAVE_ARCH_COMPAT_MMAP_BASES  if MMU && COMPAT
> select HAVE_ARCH_PREL32_RELOCATIONS
> select HAVE_ARCH_SECCOMP_FILTER
> +   select HAVE_ARCH_STATIC_CALLif X86_64
> select HAVE_ARCH_THREAD_STRUCT_WHITELIST
> select HAVE_ARCH_TRACEHOOK
> select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> diff --git a/arch/x86/include/asm/static_call.h 
> b/arch/x86/include/asm/static_call.h
> new file mode 100644
> index ..40fec631b760
> --- /dev/null
> +++ b/arch/x86/include/asm/static_call.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_STATIC_CALL_H
> +#define _ASM_STATIC_CALL_H
> +
> +#ifdef CONFIG_X86_64
> +
> +#include 
> +
> +void static_call_init(void);
> +extern void __static_call_update(void *tramp, void *func);
> +
> +#define DECLARE_STATIC_CALL(tramp, func)   \
> +   extern typeof(func) tramp;  \
> +   static void __used __section(.discard.static_call_tramps)   \
> +   *__static_call_tramp_##tramp = tramp
> +

Confused.  What's the __static_call_tramp_##tramp variable for?  And
why is a DECLARE_ macro defining a variable?

> +#define DEFINE_STATIC_CALL(tramp, func)  
>   \
> +   DECLARE_STATIC_CALL(tramp, func);   \
> +   asm(".pushsection .text, \"ax\" \n" \
> +   ".align 4   \n" \
> +   ".globl " #tramp "  \n" \
> +   ".type " #tramp ", @function\n" \
> +   #tramp ":   \n" \
> +   "jmp " #func "  \n" \

I think this would be nicer as an indirect call that gets patched to a
direct call so that the update mechanism works even before it's
initialized.  (Currently static_branch blows up horribly if you try to
update one too early, and that's rather annoying IMO.)

Also, I think you're looking for "jmp.d32", which is available in
newer toolchains.  For older toolchains, you could use .byte 0xe9 or
you could use a different section (I think) to force a real 32-bit
jump.

> +void __init static_call_init(void)
> +{
> +   struct static_call_entry *entry;
> +   unsigned long insn, tramp, func;
> +   unsigned char insn_opcode, tramp_opcode;
> +   s32 call_dest;
> +
> +   for (entry = __start_static_call_table;
> +entry < __stop_static_call_table; entry++) {
> +
> +   insn = (long)entry->insn + (unsigned long)>insn;
> +   tramp = (long)entry->tramp + (unsigned long)>tramp;
> +
> +   insn_opcode = *(unsigned char *)insn;
> +   if (insn_opcode != 0xe8 && insn_opcode != 0xe9) {
> +   WARN_ONCE(1, "unexpected static call insn opcode %x 
> at %pS",
> + insn_opcode, (void *)insn);
> +

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Josh Poimboeuf

On Mon, Oct 08, 2018 at 11:57:50PM -0400, Steven Rostedt wrote:
> On Mon, 8 Oct 2018 21:17:10 -0500
> Josh Poimboeuf  wrote:
> 
> > I'm not really convinced we need objtool for this, maybe I'll try
> > whipping up a POC.
> 
> Awesome!
> 
> I wasn't thinking of actually having objtool itself perform this task,
> but instead breaking the internals of objtool up into more of a generic
> infrastructure, that recordmcount.c, objtool, and whatever this does
> can use.

So I had been thinking that we could find the call sites at runtime, by
looking at the relocations.  But I managed to forget that vmlinux
relocations are resolved during linking.  So yeah, some kind of tooling
magic would be needed.

I worked up a POC using objtool.  It doesn't *have* to be done with
objtool, but since it's already reading/writing all the ELF stuff
anyway, it was pretty easy to add this on.

This patch has at least a few issues:

- No module support.

- For some reason, the sync_cores in text_poke_bp() don't always seem to
  be working as expected.  Running this patch on my VM, the test code in
  cmdline_proc_show() works *most* of the time, but it occasionally
  branches off into the weeds.  I have no idea what the problem is yet.

diff --git a/arch/Kconfig b/arch/Kconfig
index 9d329608913e..20ff5624dad7 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -865,6 +865,9 @@ config HAVE_ARCH_PREL32_RELOCATIONS
  architectures, and don't require runtime relocation on relocatable
  kernels.
 
+config HAVE_ARCH_STATIC_CALL
+   bool
+
 source "kernel/gcov/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5136a1281870..1a14c8f87876 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -128,6 +128,7 @@ config X86
select HAVE_ARCH_COMPAT_MMAP_BASES  if MMU && COMPAT
select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
+   select HAVE_ARCH_STATIC_CALLif X86_64
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/x86/include/asm/static_call.h 
b/arch/x86/include/asm/static_call.h
new file mode 100644
index ..40fec631b760
--- /dev/null
+++ b/arch/x86/include/asm/static_call.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_STATIC_CALL_H
+#define _ASM_STATIC_CALL_H
+
+#ifdef CONFIG_X86_64
+
+#include 
+
+void static_call_init(void);
+extern void __static_call_update(void *tramp, void *func);
+
+#define DECLARE_STATIC_CALL(tramp, func)   \
+   extern typeof(func) tramp;  \
+   static void __used __section(.discard.static_call_tramps)   \
+   *__static_call_tramp_##tramp = tramp
+
+#define DEFINE_STATIC_CALL(tramp, func)
\
+   DECLARE_STATIC_CALL(tramp, func);   \
+   asm(".pushsection .text, \"ax\" \n" \
+   ".align 4   \n" \
+   ".globl " #tramp "  \n" \
+   ".type " #tramp ", @function\n" \
+   #tramp ":   \n" \
+   "jmp " #func "  \n" \
+   ASM_NOP3 "  \n" \
+   ".popsection\n")
+
+#define static_call_update(tramp, func)
\
+   __static_call_update(tramp, func)
+
+#else /* !CONFIG_X86_64 */
+static inline void static_call_init(void) {}
+#endif
+
+#endif /* _ASM_STATIC_CALL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8824d01c0c35..e5d9f3a1e73f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -62,6 +62,7 @@ obj-y += tsc.o tsc_msr.o io_delay.o rtc.o
 obj-y  += pci-iommu_table.o
 obj-y  += resource.o
 obj-y  += irqflags.o
+obj-$(CONFIG_X86_64)   += static_call.o
 
 obj-y  += process.o
 obj-y  += fpu/
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b4866badb235..447401fc8d65 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -117,6 +117,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -874,6 +875,7 @@ void __init setup_arch(char **cmdline_p)
early_cpu_init();
arch_init_ideal_nops();
jump_label_init();
+   static_call_init();
early_ioremap_init();
 
setup_olpc_ofw_pgd();
diff --git a/arch/x86/kernel/static_call.c b/arch/x86/kernel/static_call.c
new file mode 100644
index ..e7a17ee6942d
--- /dev/null
+++

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-10 Thread Josh Poimboeuf

On Mon, Oct 08, 2018 at 11:57:50PM -0400, Steven Rostedt wrote:
> On Mon, 8 Oct 2018 21:17:10 -0500
> Josh Poimboeuf  wrote:
> 
> > I'm not really convinced we need objtool for this, maybe I'll try
> > whipping up a POC.
> 
> Awesome!
> 
> I wasn't thinking of actually having objtool itself perform this task,
> but instead breaking the internals of objtool up into more of a generic
> infrastructure, that recordmcount.c, objtool, and whatever this does
> can use.

So I had been thinking that we could find the call sites at runtime, by
looking at the relocations.  But I managed to forget that vmlinux
relocations are resolved during linking.  So yeah, some kind of tooling
magic would be needed.

I worked up a POC using objtool.  It doesn't *have* to be done with
objtool, but since it's already reading/writing all the ELF stuff
anyway, it was pretty easy to add this on.

This patch has at least a few issues:

- No module support.

- For some reason, the sync_cores in text_poke_bp() don't always seem to
  be working as expected.  Running this patch on my VM, the test code in
  cmdline_proc_show() works *most* of the time, but it occasionally
  branches off into the weeds.  I have no idea what the problem is yet.

diff --git a/arch/Kconfig b/arch/Kconfig
index 9d329608913e..20ff5624dad7 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -865,6 +865,9 @@ config HAVE_ARCH_PREL32_RELOCATIONS
  architectures, and don't require runtime relocation on relocatable
  kernels.
 
+config HAVE_ARCH_STATIC_CALL
+   bool
+
 source "kernel/gcov/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5136a1281870..1a14c8f87876 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -128,6 +128,7 @@ config X86
select HAVE_ARCH_COMPAT_MMAP_BASES  if MMU && COMPAT
select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
+   select HAVE_ARCH_STATIC_CALLif X86_64
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/x86/include/asm/static_call.h 
b/arch/x86/include/asm/static_call.h
new file mode 100644
index ..40fec631b760
--- /dev/null
+++ b/arch/x86/include/asm/static_call.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_STATIC_CALL_H
+#define _ASM_STATIC_CALL_H
+
+#ifdef CONFIG_X86_64
+
+#include 
+
+void static_call_init(void);
+extern void __static_call_update(void *tramp, void *func);
+
+#define DECLARE_STATIC_CALL(tramp, func)   \
+   extern typeof(func) tramp;  \
+   static void __used __section(.discard.static_call_tramps)   \
+   *__static_call_tramp_##tramp = tramp
+
+#define DEFINE_STATIC_CALL(tramp, func)
\
+   DECLARE_STATIC_CALL(tramp, func);   \
+   asm(".pushsection .text, \"ax\" \n" \
+   ".align 4   \n" \
+   ".globl " #tramp "  \n" \
+   ".type " #tramp ", @function\n" \
+   #tramp ":   \n" \
+   "jmp " #func "  \n" \
+   ASM_NOP3 "  \n" \
+   ".popsection\n")
+
+#define static_call_update(tramp, func)
\
+   __static_call_update(tramp, func)
+
+#else /* !CONFIG_X86_64 */
+static inline void static_call_init(void) {}
+#endif
+
+#endif /* _ASM_STATIC_CALL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8824d01c0c35..e5d9f3a1e73f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -62,6 +62,7 @@ obj-y += tsc.o tsc_msr.o io_delay.o rtc.o
 obj-y  += pci-iommu_table.o
 obj-y  += resource.o
 obj-y  += irqflags.o
+obj-$(CONFIG_X86_64)   += static_call.o
 
 obj-y  += process.o
 obj-y  += fpu/
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b4866badb235..447401fc8d65 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -117,6 +117,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -874,6 +875,7 @@ void __init setup_arch(char **cmdline_p)
early_cpu_init();
arch_init_ideal_nops();
jump_label_init();
+   static_call_init();
early_ioremap_init();
 
setup_olpc_ofw_pgd();
diff --git a/arch/x86/kernel/static_call.c b/arch/x86/kernel/static_call.c
new file mode 100644
index ..e7a17ee6942d
--- /dev/null
+++

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-09 Thread Masami Hiramatsu

On Mon, 8 Oct 2018 23:55:34 -0400
Steven Rostedt  wrote:

> On Tue, 9 Oct 2018 12:44:01 +0900
> Masami Hiramatsu  wrote:
> 
> > On Fri, 05 Oct 2018 21:51:11 -0400
> > Steven Rostedt  wrote:
> > 
> > > +typedef long dynfunc_t;
> > > +
> > > +struct dynfunc_struct;
> > > +
> > > +#define arch_dynfunc_trampoline(name, def)   \
> > > + asm volatile (  \
> > > + ".globl dynfunc_" #name "; \n\t"\
> > > + "dynfunc_" #name ": \n\t"   \
> > > + "jmp " #def " \n\t" \
> > > + ".balign 8 \n \t"   \
> > > + : : : "memory" )
> > > +  
> > 
> > I have just a question, what is this different from livepatch? :)
> 
> I actually thought about this a bit, but decided against it.
> 
> I didn't want to hook another infrastructure into the fentry nop. It's
> already complex enough with kprobes, live patching and ftrace.
> 
> The ideal solution is what Peter suggested, and that's to patch the
> call sites, and I think that is attainable with objtool modifications.

OK, the ideal solution sounds good to me. 

> 
> > 
> > I think we can replace the first 5 bytes of the default function
> > to jmp instruction (to alternative function) instead of making
> > this trampoline.
> > 
> > IOW, as far as I can see, this is changing
> > 
> > 
> > call %reg (or retpoline_reg)
> > 
> > 
> > to 
> > 
> > 
> > call dynfunc_A
> > 
> > dynfunc_A:
> > jmp func_A or altered_func_A
> > 
> > 
> > If so, why don't we put the jmp on default func_A directly?
> > 
> > call func_A
> > 
> > func_A:
> > "jmp altered_func" or "original sequence"
> > 
> > (this is idealy same as jprobes did)
> > 
> > Of course we have to arbitrate it with ftrace (fentry) but it may
> > not so hard (simplest way is just adding "notrace" on the default
> > function)
> 
> Then we lose the 5 byte nop.

Yeah, but we can remove the trampoline code.

> > BTW, I think "dynamic_function" may not correct name, it may be
> > "alternative_function" or something like that, because this
> > function must be replaced system-wide and this means we can
> > not use this for generic function pointer usage which depends
> > on thread context (like file_operations). But good for something
> > pluggable code (LSM?).
> 
> I don't like the name alternative, as that's usually a one shot deal
> (SMP vs UP).
> 
> It is dynamic, as it's a function that changes dynamically. Yes its
> global, but that's not mutually exclusive to dynamic.

OK, so we may add a note that this is "global" patching :)

> The use case I want this for is for tracing. But it can be useful for
> KVM and power management governors. Basically anything that has a
> global function pointer (hmm, even the idle call can use this).

Indeed.

Thanks,

> 
> -- Steve


-- 
Masami Hiramatsu

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-09 Thread Masami Hiramatsu

On Mon, 8 Oct 2018 23:55:34 -0400
Steven Rostedt  wrote:

> On Tue, 9 Oct 2018 12:44:01 +0900
> Masami Hiramatsu  wrote:
> 
> > On Fri, 05 Oct 2018 21:51:11 -0400
> > Steven Rostedt  wrote:
> > 
> > > +typedef long dynfunc_t;
> > > +
> > > +struct dynfunc_struct;
> > > +
> > > +#define arch_dynfunc_trampoline(name, def)   \
> > > + asm volatile (  \
> > > + ".globl dynfunc_" #name "; \n\t"\
> > > + "dynfunc_" #name ": \n\t"   \
> > > + "jmp " #def " \n\t" \
> > > + ".balign 8 \n \t"   \
> > > + : : : "memory" )
> > > +  
> > 
> > I have just a question, what is this different from livepatch? :)
> 
> I actually thought about this a bit, but decided against it.
> 
> I didn't want to hook another infrastructure into the fentry nop. It's
> already complex enough with kprobes, live patching and ftrace.
> 
> The ideal solution is what Peter suggested, and that's to patch the
> call sites, and I think that is attainable with objtool modifications.

OK, the ideal solution sounds good to me. 

> 
> > 
> > I think we can replace the first 5 bytes of the default function
> > to jmp instruction (to alternative function) instead of making
> > this trampoline.
> > 
> > IOW, as far as I can see, this is changing
> > 
> > 
> > call %reg (or retpoline_reg)
> > 
> > 
> > to 
> > 
> > 
> > call dynfunc_A
> > 
> > dynfunc_A:
> > jmp func_A or altered_func_A
> > 
> > 
> > If so, why don't we put the jmp on default func_A directly?
> > 
> > call func_A
> > 
> > func_A:
> > "jmp altered_func" or "original sequence"
> > 
> > (this is idealy same as jprobes did)
> > 
> > Of course we have to arbitrate it with ftrace (fentry) but it may
> > not so hard (simplest way is just adding "notrace" on the default
> > function)
> 
> Then we lose the 5 byte nop.

Yeah, but we can remove the trampoline code.

> > BTW, I think "dynamic_function" may not correct name, it may be
> > "alternative_function" or something like that, because this
> > function must be replaced system-wide and this means we can
> > not use this for generic function pointer usage which depends
> > on thread context (like file_operations). But good for something
> > pluggable code (LSM?).
> 
> I don't like the name alternative, as that's usually a one shot deal
> (SMP vs UP).
> 
> It is dynamic, as it's a function that changes dynamically. Yes its
> global, but that's not mutually exclusive to dynamic.

OK, so we may add a note that this is "global" patching :)

> The use case I want this for is for tracing. But it can be useful for
> KVM and power management governors. Basically anything that has a
> global function pointer (hmm, even the idle call can use this).

Indeed.

Thanks,

> 
> -- Steve


-- 
Masami Hiramatsu

RE: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-09 Thread David Laight

From: Masami Hiramatsu
> Sent: 09 October 2018 04:44
...
> I think we can replace the first 5 bytes of the default function
> to jmp instruction (to alternative function) instead of making
> this trampoline.

Or have a trampoline that is just a jump instruction and overwrite
the target address at run time to select the non-default code.
With care the target address can be aligned so that the write is atomic
and can be done while other cpu might be calling the function.

This will be lower impact that a 'jump indirect' - especially since
the latter would have to be implemented using a 'retpoline'.

It would also make it possible to re-instate the default function.
(By saving its address after the jump.)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

RE: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-09 Thread David Laight

From: Masami Hiramatsu
> Sent: 09 October 2018 04:44
...
> I think we can replace the first 5 bytes of the default function
> to jmp instruction (to alternative function) instead of making
> this trampoline.

Or have a trampoline that is just a jump instruction and overwrite
the target address at run time to select the non-default code.
With care the target address can be aligned so that the write is atomic
and can be done while other cpu might be calling the function.

This will be lower impact that a 'jump indirect' - especially since
the latter would have to be implemented using a 'retpoline'.

It would also make it possible to re-instate the default function.
(By saving its address after the jump.)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Steven Rostedt

On Mon, 8 Oct 2018 21:17:10 -0500
Josh Poimboeuf  wrote:

> I'm not really convinced we need objtool for this, maybe I'll try
> whipping up a POC.

Awesome!

I wasn't thinking of actually having objtool itself perform this task,
but instead breaking the internals of objtool up into more of a generic
infrastructure, that recordmcount.c, objtool, and whatever this does
can use.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Steven Rostedt

On Mon, 8 Oct 2018 21:17:10 -0500
Josh Poimboeuf  wrote:

> I'm not really convinced we need objtool for this, maybe I'll try
> whipping up a POC.

Awesome!

I wasn't thinking of actually having objtool itself perform this task,
but instead breaking the internals of objtool up into more of a generic
infrastructure, that recordmcount.c, objtool, and whatever this does
can use.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Steven Rostedt

On Tue, 9 Oct 2018 12:44:01 +0900
Masami Hiramatsu  wrote:

> On Fri, 05 Oct 2018 21:51:11 -0400
> Steven Rostedt  wrote:
> 
> > +typedef long dynfunc_t;
> > +
> > +struct dynfunc_struct;
> > +
> > +#define arch_dynfunc_trampoline(name, def) \
> > +   asm volatile (  \
> > +   ".globl dynfunc_" #name "; \n\t"\
> > +   "dynfunc_" #name ": \n\t"   \
> > +   "jmp " #def " \n\t" \
> > +   ".balign 8 \n \t"   \
> > +   : : : "memory" )
> > +  
> 
> I have just a question, what is this different from livepatch? :)

I actually thought about this a bit, but decided against it.

I didn't want to hook another infrastructure into the fentry nop. It's
already complex enough with kprobes, live patching and ftrace.

The ideal solution is what Peter suggested, and that's to patch the
call sites, and I think that is attainable with objtool modifications.

> 
> I think we can replace the first 5 bytes of the default function
> to jmp instruction (to alternative function) instead of making
> this trampoline.
> 
> IOW, as far as I can see, this is changing
> 
> 
> call %reg (or retpoline_reg)
> 
> 
> to 
> 
> 
> call dynfunc_A
> 
> dynfunc_A:
> jmp func_A or altered_func_A
> 
> 
> If so, why don't we put the jmp on default func_A directly?
> 
> call func_A
> 
> func_A:
> "jmp altered_func" or "original sequence"
> 
> (this is idealy same as jprobes did)
> 
> Of course we have to arbitrate it with ftrace (fentry) but it may
> not so hard (simplest way is just adding "notrace" on the default
> function)

Then we lose the 5 byte nop.

> 
> BTW, I think "dynamic_function" may not correct name, it may be
> "alternative_function" or something like that, because this
> function must be replaced system-wide and this means we can
> not use this for generic function pointer usage which depends
> on thread context (like file_operations). But good for something
> pluggable code (LSM?).

I don't like the name alternative, as that's usually a one shot deal
(SMP vs UP).

It is dynamic, as it's a function that changes dynamically. Yes its
global, but that's not mutually exclusive to dynamic.

The use case I want this for is for tracing. But it can be useful for
KVM and power management governors. Basically anything that has a
global function pointer (hmm, even the idle call can use this).

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Steven Rostedt

On Tue, 9 Oct 2018 12:44:01 +0900
Masami Hiramatsu  wrote:

> On Fri, 05 Oct 2018 21:51:11 -0400
> Steven Rostedt  wrote:
> 
> > +typedef long dynfunc_t;
> > +
> > +struct dynfunc_struct;
> > +
> > +#define arch_dynfunc_trampoline(name, def) \
> > +   asm volatile (  \
> > +   ".globl dynfunc_" #name "; \n\t"\
> > +   "dynfunc_" #name ": \n\t"   \
> > +   "jmp " #def " \n\t" \
> > +   ".balign 8 \n \t"   \
> > +   : : : "memory" )
> > +  
> 
> I have just a question, what is this different from livepatch? :)

I actually thought about this a bit, but decided against it.

I didn't want to hook another infrastructure into the fentry nop. It's
already complex enough with kprobes, live patching and ftrace.

The ideal solution is what Peter suggested, and that's to patch the
call sites, and I think that is attainable with objtool modifications.

> 
> I think we can replace the first 5 bytes of the default function
> to jmp instruction (to alternative function) instead of making
> this trampoline.
> 
> IOW, as far as I can see, this is changing
> 
> 
> call %reg (or retpoline_reg)
> 
> 
> to 
> 
> 
> call dynfunc_A
> 
> dynfunc_A:
> jmp func_A or altered_func_A
> 
> 
> If so, why don't we put the jmp on default func_A directly?
> 
> call func_A
> 
> func_A:
> "jmp altered_func" or "original sequence"
> 
> (this is idealy same as jprobes did)
> 
> Of course we have to arbitrate it with ftrace (fentry) but it may
> not so hard (simplest way is just adding "notrace" on the default
> function)

Then we lose the 5 byte nop.

> 
> BTW, I think "dynamic_function" may not correct name, it may be
> "alternative_function" or something like that, because this
> function must be replaced system-wide and this means we can
> not use this for generic function pointer usage which depends
> on thread context (like file_operations). But good for something
> pluggable code (LSM?).

I don't like the name alternative, as that's usually a one shot deal
(SMP vs UP).

It is dynamic, as it's a function that changes dynamically. Yes its
global, but that's not mutually exclusive to dynamic.

The use case I want this for is for tracing. But it can be useful for
KVM and power management governors. Basically anything that has a
global function pointer (hmm, even the idle call can use this).

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Masami Hiramatsu

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> +typedef long dynfunc_t;
> +
> +struct dynfunc_struct;
> +
> +#define arch_dynfunc_trampoline(name, def)   \
> + asm volatile (  \
> + ".globl dynfunc_" #name "; \n\t"\
> + "dynfunc_" #name ": \n\t"   \
> + "jmp " #def " \n\t" \
> + ".balign 8 \n \t"   \
> + : : : "memory" )
> +

I have just a question, what is this different from livepatch? :)

I think we can replace the first 5 bytes of the default function
to jmp instruction (to alternative function) instead of making
this trampoline.

IOW, as far as I can see, this is changing

call %reg (or retpoline_reg)

to 

call dynfunc_A

dynfunc_A:
jmp func_A or altered_func_A

If so, why don't we put the jmp on default func_A directly?

call func_A

func_A:
"jmp altered_func" or "original sequence"

(this is idealy same as jprobes did)

Of course we have to arbitrate it with ftrace (fentry) but it may
not so hard (simplest way is just adding "notrace" on the default
function)

BTW, I think "dynamic_function" may not correct name, it may be
"alternative_function" or something like that, because this
function must be replaced system-wide and this means we can
not use this for generic function pointer usage which depends
on thread context (like file_operations). But good for something
pluggable code (LSM?).

Thank you,

-- 
Masami Hiramatsu

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Masami Hiramatsu

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> +typedef long dynfunc_t;
> +
> +struct dynfunc_struct;
> +
> +#define arch_dynfunc_trampoline(name, def)   \
> + asm volatile (  \
> + ".globl dynfunc_" #name "; \n\t"\
> + "dynfunc_" #name ": \n\t"   \
> + "jmp " #def " \n\t" \
> + ".balign 8 \n \t"   \
> + : : : "memory" )
> +

I have just a question, what is this different from livepatch? :)

I think we can replace the first 5 bytes of the default function
to jmp instruction (to alternative function) instead of making
this trampoline.

IOW, as far as I can see, this is changing

call %reg (or retpoline_reg)

to 

call dynfunc_A

dynfunc_A:
jmp func_A or altered_func_A

If so, why don't we put the jmp on default func_A directly?

call func_A

func_A:
"jmp altered_func" or "original sequence"

(this is idealy same as jprobes did)

Of course we have to arbitrate it with ftrace (fentry) but it may
not so hard (simplest way is just adding "notrace" on the default
function)

BTW, I think "dynamic_function" may not correct name, it may be
"alternative_function" or something like that, because this
function must be replaced system-wide and this means we can
not use this for generic function pointer usage which depends
on thread context (like file_operations). But good for something
pluggable code (LSM?).

Thank you,

-- 
Masami Hiramatsu

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Josh Poimboeuf

On Mon, Oct 08, 2018 at 09:29:56AM -0700, Andy Lutomirski wrote:
> 
> 
> > On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  wrote:
> > 
> > On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
> >>> Can't we hijack the relocation records for these functions before they
> >>> get thrown out in the (final) link pass or something?
> >> 
> >> I could be talking out my arse here, but I thought we could do this,
> >> too, then changed my mind.  The relocation records give us the
> >> location of the call or jump operand, but they don’t give the address
> >> of the beginning of the instruction.
> > 
> > But that's like 1 byte before the operand, right? We could even double check
> > this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> > 
> > AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> > ,
> > so if we have the PLT32 location, we also have the instruction location. Or 
> > am
> > I missing something?
> 
> There’s also JMP and Jcc, any of which can be used for rail calls, but
> those are also one byte. I suppose GCC is unlikely to emit a prefixed
> form of any of these. So maybe we really can assume they’re all one
> byte.

I'm pretty sure only a basic JMP is used for tail calls.

> But there is a nasty potential special case: anything that takes the
> function’s address. This includes jump tables, computed gotos, and
> plain old function pointers. And I suspect that any of these could
> have one of the rather large number of CALL/JMP/Jcc bytes before the
> relocation by coincidence.

But those special cases aren't in a text section, right?  If we just
make sure the relocations are applied to a text section, and that
they're preceded by the CALL or JMP byte, wouldn't that be sufficient?

I'm not really convinced we need objtool for this, maybe I'll try
whipping up a POC.

-- 
Josh

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Josh Poimboeuf

On Mon, Oct 08, 2018 at 09:29:56AM -0700, Andy Lutomirski wrote:
> 
> 
> > On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  wrote:
> > 
> > On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
> >>> Can't we hijack the relocation records for these functions before they
> >>> get thrown out in the (final) link pass or something?
> >> 
> >> I could be talking out my arse here, but I thought we could do this,
> >> too, then changed my mind.  The relocation records give us the
> >> location of the call or jump operand, but they don’t give the address
> >> of the beginning of the instruction.
> > 
> > But that's like 1 byte before the operand, right? We could even double check
> > this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> > 
> > AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> > ,
> > so if we have the PLT32 location, we also have the instruction location. Or 
> > am
> > I missing something?
> 
> There’s also JMP and Jcc, any of which can be used for rail calls, but
> those are also one byte. I suppose GCC is unlikely to emit a prefixed
> form of any of these. So maybe we really can assume they’re all one
> byte.

I'm pretty sure only a basic JMP is used for tail calls.

> But there is a nasty potential special case: anything that takes the
> function’s address. This includes jump tables, computed gotos, and
> plain old function pointers. And I suspect that any of these could
> have one of the rather large number of CALL/JMP/Jcc bytes before the
> relocation by coincidence.

But those special cases aren't in a text section, right?  If we just
make sure the relocations are applied to a text section, and that
they're preceded by the CALL or JMP byte, wouldn't that be sufficient?

I'm not really convinced we need objtool for this, maybe I'll try
whipping up a POC.

-- 
Josh

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Andy Lutomirski

On Mon, Oct 8, 2018 at 10:44 AM Jiri Kosina  wrote:
>
> On Mon, 8 Oct 2018, Ard Biesheuvel wrote:
>
> > Does that mean that architectures could opt out of doing the whole
> > objtool + relocation processing thing, and instead take the hit of
> > going through the trampoline for all calls?
>
> There are architectures that aren't [currently] supported by objtool at
> all anyway.
>

The the credit of most architectures, though, the only reason x86
would want to use objtool instead of digging the results directly out
of the relocation data is that x86 has an overcomplicated instruction
encoding and there's no fully reliable way to find the address of the
instruction that contains a given relocation without fully
disassembling the binary.

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Andy Lutomirski

On Mon, Oct 8, 2018 at 10:44 AM Jiri Kosina  wrote:
>
> On Mon, 8 Oct 2018, Ard Biesheuvel wrote:
>
> > Does that mean that architectures could opt out of doing the whole
> > objtool + relocation processing thing, and instead take the hit of
> > going through the trampoline for all calls?
>
> There are architectures that aren't [currently] supported by objtool at
> all anyway.
>

The the credit of most architectures, though, the only reason x86
would want to use objtool instead of digging the results directly out
of the relocation data is that x86 has an overcomplicated instruction
encoding and there's no fully reliable way to find the address of the
instruction that contains a given relocation without fully
disassembling the binary.

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Ard Biesheuvel

On 8 October 2018 at 19:44, Jiri Kosina  wrote:
> On Mon, 8 Oct 2018, Ard Biesheuvel wrote:
>
>> Does that mean that architectures could opt out of doing the whole
>> objtool + relocation processing thing, and instead take the hit of
>> going through the trampoline for all calls?
>
> There are architectures that aren't [currently] supported by objtool at
> all anyway.
>

That was kind of my point :-)

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Ard Biesheuvel

On 8 October 2018 at 19:44, Jiri Kosina  wrote:
> On Mon, 8 Oct 2018, Ard Biesheuvel wrote:
>
>> Does that mean that architectures could opt out of doing the whole
>> objtool + relocation processing thing, and instead take the hit of
>> going through the trampoline for all calls?
>
> There are architectures that aren't [currently] supported by objtool at
> all anyway.
>

That was kind of my point :-)

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Jiri Kosina

On Mon, 8 Oct 2018, Ard Biesheuvel wrote:

> Does that mean that architectures could opt out of doing the whole
> objtool + relocation processing thing, and instead take the hit of
> going through the trampoline for all calls?

There are architectures that aren't [currently] supported by objtool at 
all anyway.

-- 
Jiri Kosina
SUSE Labs

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Jiri Kosina

On Mon, 8 Oct 2018, Ard Biesheuvel wrote:

> Does that mean that architectures could opt out of doing the whole
> objtool + relocation processing thing, and instead take the hit of
> going through the trampoline for all calls?

There are architectures that aren't [currently] supported by objtool at 
all anyway.

-- 
Jiri Kosina
SUSE Labs

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Andy Lutomirski

On Mon, Oct 8, 2018 at 10:30 AM Ard Biesheuvel
 wrote:
>
> On 8 October 2018 at 19:25, Andy Lutomirski  wrote:
> > On Mon, Oct 8, 2018 at 9:40 AM Peter Zijlstra  wrote:
> >>
> >> On Mon, Oct 08, 2018 at 09:29:56AM -0700, Andy Lutomirski wrote:
> >> >
> >> >
> >> > > On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  
> >> > > wrote:
> >> > >
> >> > > On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
> >> > >>> Can't we hijack the relocation records for these functions before 
> >> > >>> they
> >> > >>> get thrown out in the (final) link pass or something?
> >> > >>
> >> > >> I could be talking out my arse here, but I thought we could do this,
> >> > >> too, then changed my mind.  The relocation records give us the
> >> > >> location of the call or jump operand, but they don’t give the address
> >> > >> of the beginning of the instruction.
> >> > >
> >> > > But that's like 1 byte before the operand, right? We could even double 
> >> > > check
> >> > > this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> >> > >
> >> > > AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> >> > > ,
> >> > > so if we have the PLT32 location, we also have the instruction 
> >> > > location. Or am
> >> > > I missing something?
> >> >
> >> > There’s also JMP and Jcc, any of which can be used for rail calls, but
> >> > those are also one byte. I suppose GCC is unlikely to emit a prefixed
> >> > form of any of these. So maybe we really can assume they’re all one
> >> > byte.
> >>
> >> Oh, I had not considered tail calls..
> >>
> >> > But there is a nasty potential special case: anything that takes the
> >> > function’s address. This includes jump tables, computed gotos, and
> >> > plain old function pointers. And I suspect that any of these could
> >> > have one of the rather large number of CALL/JMP/Jcc bytes before the
> >> > relocation by coincidence.
> >>
> >> We can have objtool verify the CALL/JMP/Jcc only condition. So if
> >> someone tries to take the address of a patchable function, it will error
> >> out.
> >
> > I think we should just ignore the sites that take the address and
> > maybe issue a warning.  After all, GCC can create them all by itself.
> > We'll always have a plain wrapper function, and I think we should just
> > not patch code that takes its address.  So we do, roughly:
> >
> > void default_foo(void);
> >
> > GLOBAL(foo)
> >   jmp *current_foo(%rip)
> > ENDPROC(foo)
> >
> > And code that does:
> >
> > foo();
> >
> > as a call, a tail call, a conditional tail call, etc, gets discovered
> > by objtool + relocation processing or whatever and gets patched.  (And
> > foo() itself gets patched, too, as a special case.  But we patch foo
> > itself at some point during boot to turn it into a direct JMP.  Doing
> > it this way means that the whole mechanism works from very early
> > boot.)
>
> Does that mean that architectures could opt out of doing the whole
> objtool + relocation processing thing, and instead take the hit of
> going through the trampoline for all calls?
>

I don't see why not.

--Andy

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Andy Lutomirski

On Mon, Oct 8, 2018 at 10:30 AM Ard Biesheuvel
 wrote:
>
> On 8 October 2018 at 19:25, Andy Lutomirski  wrote:
> > On Mon, Oct 8, 2018 at 9:40 AM Peter Zijlstra  wrote:
> >>
> >> On Mon, Oct 08, 2018 at 09:29:56AM -0700, Andy Lutomirski wrote:
> >> >
> >> >
> >> > > On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  
> >> > > wrote:
> >> > >
> >> > > On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
> >> > >>> Can't we hijack the relocation records for these functions before 
> >> > >>> they
> >> > >>> get thrown out in the (final) link pass or something?
> >> > >>
> >> > >> I could be talking out my arse here, but I thought we could do this,
> >> > >> too, then changed my mind.  The relocation records give us the
> >> > >> location of the call or jump operand, but they don’t give the address
> >> > >> of the beginning of the instruction.
> >> > >
> >> > > But that's like 1 byte before the operand, right? We could even double 
> >> > > check
> >> > > this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> >> > >
> >> > > AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> >> > > ,
> >> > > so if we have the PLT32 location, we also have the instruction 
> >> > > location. Or am
> >> > > I missing something?
> >> >
> >> > There’s also JMP and Jcc, any of which can be used for rail calls, but
> >> > those are also one byte. I suppose GCC is unlikely to emit a prefixed
> >> > form of any of these. So maybe we really can assume they’re all one
> >> > byte.
> >>
> >> Oh, I had not considered tail calls..
> >>
> >> > But there is a nasty potential special case: anything that takes the
> >> > function’s address. This includes jump tables, computed gotos, and
> >> > plain old function pointers. And I suspect that any of these could
> >> > have one of the rather large number of CALL/JMP/Jcc bytes before the
> >> > relocation by coincidence.
> >>
> >> We can have objtool verify the CALL/JMP/Jcc only condition. So if
> >> someone tries to take the address of a patchable function, it will error
> >> out.
> >
> > I think we should just ignore the sites that take the address and
> > maybe issue a warning.  After all, GCC can create them all by itself.
> > We'll always have a plain wrapper function, and I think we should just
> > not patch code that takes its address.  So we do, roughly:
> >
> > void default_foo(void);
> >
> > GLOBAL(foo)
> >   jmp *current_foo(%rip)
> > ENDPROC(foo)
> >
> > And code that does:
> >
> > foo();
> >
> > as a call, a tail call, a conditional tail call, etc, gets discovered
> > by objtool + relocation processing or whatever and gets patched.  (And
> > foo() itself gets patched, too, as a special case.  But we patch foo
> > itself at some point during boot to turn it into a direct JMP.  Doing
> > it this way means that the whole mechanism works from very early
> > boot.)
>
> Does that mean that architectures could opt out of doing the whole
> objtool + relocation processing thing, and instead take the hit of
> going through the trampoline for all calls?
>

I don't see why not.

--Andy

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Ard Biesheuvel

On 8 October 2018 at 19:25, Andy Lutomirski  wrote:
> On Mon, Oct 8, 2018 at 9:40 AM Peter Zijlstra  wrote:
>>
>> On Mon, Oct 08, 2018 at 09:29:56AM -0700, Andy Lutomirski wrote:
>> >
>> >
>> > > On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  wrote:
>> > >
>> > > On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
>> > >>> Can't we hijack the relocation records for these functions before they
>> > >>> get thrown out in the (final) link pass or something?
>> > >>
>> > >> I could be talking out my arse here, but I thought we could do this,
>> > >> too, then changed my mind.  The relocation records give us the
>> > >> location of the call or jump operand, but they don’t give the address
>> > >> of the beginning of the instruction.
>> > >
>> > > But that's like 1 byte before the operand, right? We could even double 
>> > > check
>> > > this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
>> > >
>> > > AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
>> > > ,
>> > > so if we have the PLT32 location, we also have the instruction location. 
>> > > Or am
>> > > I missing something?
>> >
>> > There’s also JMP and Jcc, any of which can be used for rail calls, but
>> > those are also one byte. I suppose GCC is unlikely to emit a prefixed
>> > form of any of these. So maybe we really can assume they’re all one
>> > byte.
>>
>> Oh, I had not considered tail calls..
>>
>> > But there is a nasty potential special case: anything that takes the
>> > function’s address. This includes jump tables, computed gotos, and
>> > plain old function pointers. And I suspect that any of these could
>> > have one of the rather large number of CALL/JMP/Jcc bytes before the
>> > relocation by coincidence.
>>
>> We can have objtool verify the CALL/JMP/Jcc only condition. So if
>> someone tries to take the address of a patchable function, it will error
>> out.
>
> I think we should just ignore the sites that take the address and
> maybe issue a warning.  After all, GCC can create them all by itself.
> We'll always have a plain wrapper function, and I think we should just
> not patch code that takes its address.  So we do, roughly:
>
> void default_foo(void);
>
> GLOBAL(foo)
>   jmp *current_foo(%rip)
> ENDPROC(foo)
>
> And code that does:
>
> foo();
>
> as a call, a tail call, a conditional tail call, etc, gets discovered
> by objtool + relocation processing or whatever and gets patched.  (And
> foo() itself gets patched, too, as a special case.  But we patch foo
> itself at some point during boot to turn it into a direct JMP.  Doing
> it this way means that the whole mechanism works from very early
> boot.)

Does that mean that architectures could opt out of doing the whole
objtool + relocation processing thing, and instead take the hit of
going through the trampoline for all calls?

> And anything awful like:
>
> switch(whatever) {
> case 0:
>   foo();
> };
>
> that gets translated to a jump table and gets optimized such that it
> jumps straight to foo just gets left alone, since it still works.
> It's just a bit suboptimial.  Similarly, code that does:
>
> void (*ptr)(void);
> ptr = foo;
>
> gets a bona fide pointer to foo(), and any calls through the pointer
> land on foo() and jump to the current selected foo with only a single
> indirect branch / retpoline.
>
> Does this seem reasonable?  Is there a reason we should make it more
> restrictive?

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Ard Biesheuvel

On 8 October 2018 at 19:25, Andy Lutomirski  wrote:
> On Mon, Oct 8, 2018 at 9:40 AM Peter Zijlstra  wrote:
>>
>> On Mon, Oct 08, 2018 at 09:29:56AM -0700, Andy Lutomirski wrote:
>> >
>> >
>> > > On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  wrote:
>> > >
>> > > On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
>> > >>> Can't we hijack the relocation records for these functions before they
>> > >>> get thrown out in the (final) link pass or something?
>> > >>
>> > >> I could be talking out my arse here, but I thought we could do this,
>> > >> too, then changed my mind.  The relocation records give us the
>> > >> location of the call or jump operand, but they don’t give the address
>> > >> of the beginning of the instruction.
>> > >
>> > > But that's like 1 byte before the operand, right? We could even double 
>> > > check
>> > > this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
>> > >
>> > > AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
>> > > ,
>> > > so if we have the PLT32 location, we also have the instruction location. 
>> > > Or am
>> > > I missing something?
>> >
>> > There’s also JMP and Jcc, any of which can be used for rail calls, but
>> > those are also one byte. I suppose GCC is unlikely to emit a prefixed
>> > form of any of these. So maybe we really can assume they’re all one
>> > byte.
>>
>> Oh, I had not considered tail calls..
>>
>> > But there is a nasty potential special case: anything that takes the
>> > function’s address. This includes jump tables, computed gotos, and
>> > plain old function pointers. And I suspect that any of these could
>> > have one of the rather large number of CALL/JMP/Jcc bytes before the
>> > relocation by coincidence.
>>
>> We can have objtool verify the CALL/JMP/Jcc only condition. So if
>> someone tries to take the address of a patchable function, it will error
>> out.
>
> I think we should just ignore the sites that take the address and
> maybe issue a warning.  After all, GCC can create them all by itself.
> We'll always have a plain wrapper function, and I think we should just
> not patch code that takes its address.  So we do, roughly:
>
> void default_foo(void);
>
> GLOBAL(foo)
>   jmp *current_foo(%rip)
> ENDPROC(foo)
>
> And code that does:
>
> foo();
>
> as a call, a tail call, a conditional tail call, etc, gets discovered
> by objtool + relocation processing or whatever and gets patched.  (And
> foo() itself gets patched, too, as a special case.  But we patch foo
> itself at some point during boot to turn it into a direct JMP.  Doing
> it this way means that the whole mechanism works from very early
> boot.)

Does that mean that architectures could opt out of doing the whole
objtool + relocation processing thing, and instead take the hit of
going through the trampoline for all calls?

> And anything awful like:
>
> switch(whatever) {
> case 0:
>   foo();
> };
>
> that gets translated to a jump table and gets optimized such that it
> jumps straight to foo just gets left alone, since it still works.
> It's just a bit suboptimial.  Similarly, code that does:
>
> void (*ptr)(void);
> ptr = foo;
>
> gets a bona fide pointer to foo(), and any calls through the pointer
> land on foo() and jump to the current selected foo with only a single
> indirect branch / retpoline.
>
> Does this seem reasonable?  Is there a reason we should make it more
> restrictive?

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Andy Lutomirski

On Mon, Oct 8, 2018 at 9:40 AM Peter Zijlstra  wrote:
>
> On Mon, Oct 08, 2018 at 09:29:56AM -0700, Andy Lutomirski wrote:
> >
> >
> > > On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  wrote:
> > >
> > > On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
> > >>> Can't we hijack the relocation records for these functions before they
> > >>> get thrown out in the (final) link pass or something?
> > >>
> > >> I could be talking out my arse here, but I thought we could do this,
> > >> too, then changed my mind.  The relocation records give us the
> > >> location of the call or jump operand, but they don’t give the address
> > >> of the beginning of the instruction.
> > >
> > > But that's like 1 byte before the operand, right? We could even double 
> > > check
> > > this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> > >
> > > AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> > > ,
> > > so if we have the PLT32 location, we also have the instruction location. 
> > > Or am
> > > I missing something?
> >
> > There’s also JMP and Jcc, any of which can be used for rail calls, but
> > those are also one byte. I suppose GCC is unlikely to emit a prefixed
> > form of any of these. So maybe we really can assume they’re all one
> > byte.
>
> Oh, I had not considered tail calls..
>
> > But there is a nasty potential special case: anything that takes the
> > function’s address. This includes jump tables, computed gotos, and
> > plain old function pointers. And I suspect that any of these could
> > have one of the rather large number of CALL/JMP/Jcc bytes before the
> > relocation by coincidence.
>
> We can have objtool verify the CALL/JMP/Jcc only condition. So if
> someone tries to take the address of a patchable function, it will error
> out.

I think we should just ignore the sites that take the address and
maybe issue a warning.  After all, GCC can create them all by itself.
We'll always have a plain wrapper function, and I think we should just
not patch code that takes its address.  So we do, roughly:

void default_foo(void);

GLOBAL(foo)
  jmp *current_foo(%rip)
ENDPROC(foo)

And code that does:

foo();

as a call, a tail call, a conditional tail call, etc, gets discovered
by objtool + relocation processing or whatever and gets patched.  (And
foo() itself gets patched, too, as a special case.  But we patch foo
itself at some point during boot to turn it into a direct JMP.  Doing
it this way means that the whole mechanism works from very early
boot.)  And anything awful like:

switch(whatever) {
case 0:
  foo();
};

that gets translated to a jump table and gets optimized such that it
jumps straight to foo just gets left alone, since it still works.
It's just a bit suboptimial.  Similarly, code that does:

void (*ptr)(void);
ptr = foo;

gets a bona fide pointer to foo(), and any calls through the pointer
land on foo() and jump to the current selected foo with only a single
indirect branch / retpoline.

Does this seem reasonable?  Is there a reason we should make it more
restrictive?

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Andy Lutomirski

On Mon, Oct 8, 2018 at 9:40 AM Peter Zijlstra  wrote:
>
> On Mon, Oct 08, 2018 at 09:29:56AM -0700, Andy Lutomirski wrote:
> >
> >
> > > On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  wrote:
> > >
> > > On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
> > >>> Can't we hijack the relocation records for these functions before they
> > >>> get thrown out in the (final) link pass or something?
> > >>
> > >> I could be talking out my arse here, but I thought we could do this,
> > >> too, then changed my mind.  The relocation records give us the
> > >> location of the call or jump operand, but they don’t give the address
> > >> of the beginning of the instruction.
> > >
> > > But that's like 1 byte before the operand, right? We could even double 
> > > check
> > > this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> > >
> > > AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> > > ,
> > > so if we have the PLT32 location, we also have the instruction location. 
> > > Or am
> > > I missing something?
> >
> > There’s also JMP and Jcc, any of which can be used for rail calls, but
> > those are also one byte. I suppose GCC is unlikely to emit a prefixed
> > form of any of these. So maybe we really can assume they’re all one
> > byte.
>
> Oh, I had not considered tail calls..
>
> > But there is a nasty potential special case: anything that takes the
> > function’s address. This includes jump tables, computed gotos, and
> > plain old function pointers. And I suspect that any of these could
> > have one of the rather large number of CALL/JMP/Jcc bytes before the
> > relocation by coincidence.
>
> We can have objtool verify the CALL/JMP/Jcc only condition. So if
> someone tries to take the address of a patchable function, it will error
> out.

I think we should just ignore the sites that take the address and
maybe issue a warning.  After all, GCC can create them all by itself.
We'll always have a plain wrapper function, and I think we should just
not patch code that takes its address.  So we do, roughly:

void default_foo(void);

GLOBAL(foo)
  jmp *current_foo(%rip)
ENDPROC(foo)

And code that does:

foo();

as a call, a tail call, a conditional tail call, etc, gets discovered
by objtool + relocation processing or whatever and gets patched.  (And
foo() itself gets patched, too, as a special case.  But we patch foo
itself at some point during boot to turn it into a direct JMP.  Doing
it this way means that the whole mechanism works from very early
boot.)  And anything awful like:

switch(whatever) {
case 0:
  foo();
};

that gets translated to a jump table and gets optimized such that it
jumps straight to foo just gets left alone, since it still works.
It's just a bit suboptimial.  Similarly, code that does:

void (*ptr)(void);
ptr = foo;

gets a bona fide pointer to foo(), and any calls through the pointer
land on foo() and jump to the current selected foo with only a single
indirect branch / retpoline.

Does this seem reasonable?  Is there a reason we should make it more
restrictive?

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Peter Zijlstra

On Mon, Oct 08, 2018 at 09:29:56AM -0700, Andy Lutomirski wrote:
> 
> 
> > On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  wrote:
> > 
> > On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
> >>> Can't we hijack the relocation records for these functions before they
> >>> get thrown out in the (final) link pass or something?
> >> 
> >> I could be talking out my arse here, but I thought we could do this,
> >> too, then changed my mind.  The relocation records give us the
> >> location of the call or jump operand, but they don’t give the address
> >> of the beginning of the instruction.
> > 
> > But that's like 1 byte before the operand, right? We could even double check
> > this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> > 
> > AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> > ,
> > so if we have the PLT32 location, we also have the instruction location. Or 
> > am
> > I missing something?
> 
> There’s also JMP and Jcc, any of which can be used for rail calls, but
> those are also one byte. I suppose GCC is unlikely to emit a prefixed
> form of any of these. So maybe we really can assume they’re all one
> byte.

Oh, I had not considered tail calls..

> But there is a nasty potential special case: anything that takes the
> function’s address. This includes jump tables, computed gotos, and
> plain old function pointers. And I suspect that any of these could
> have one of the rather large number of CALL/JMP/Jcc bytes before the
> relocation by coincidence.

We can have objtool verify the CALL/JMP/Jcc only condition. So if
someone tries to take the address of a patchable function, it will error
out.

Heck, it could initially even error out on tail calls.

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Peter Zijlstra

On Mon, Oct 08, 2018 at 09:29:56AM -0700, Andy Lutomirski wrote:
> 
> 
> > On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  wrote:
> > 
> > On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
> >>> Can't we hijack the relocation records for these functions before they
> >>> get thrown out in the (final) link pass or something?
> >> 
> >> I could be talking out my arse here, but I thought we could do this,
> >> too, then changed my mind.  The relocation records give us the
> >> location of the call or jump operand, but they don’t give the address
> >> of the beginning of the instruction.
> > 
> > But that's like 1 byte before the operand, right? We could even double check
> > this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> > 
> > AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> > ,
> > so if we have the PLT32 location, we also have the instruction location. Or 
> > am
> > I missing something?
> 
> There’s also JMP and Jcc, any of which can be used for rail calls, but
> those are also one byte. I suppose GCC is unlikely to emit a prefixed
> form of any of these. So maybe we really can assume they’re all one
> byte.

Oh, I had not considered tail calls..

> But there is a nasty potential special case: anything that takes the
> function’s address. This includes jump tables, computed gotos, and
> plain old function pointers. And I suspect that any of these could
> have one of the rather large number of CALL/JMP/Jcc bytes before the
> relocation by coincidence.

We can have objtool verify the CALL/JMP/Jcc only condition. So if
someone tries to take the address of a patchable function, it will error
out.

Heck, it could initially even error out on tail calls.

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Steven Rostedt

On Mon, 8 Oct 2018 09:29:56 -0700
Andy Lutomirski  wrote:

> > On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  wrote:
> > 
> > On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:  
> >>> Can't we hijack the relocation records for these functions before they
> >>> get thrown out in the (final) link pass or something?  
> >> 
> >> I could be talking out my arse here, but I thought we could do this,
> >> too, then changed my mind.  The relocation records give us the
> >> location of the call or jump operand, but they don’t give the address
> >> of the beginning of the instruction.  
> > 
> > But that's like 1 byte before the operand, right? We could even double check
> > this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> > 
> > AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> > ,
> > so if we have the PLT32 location, we also have the instruction location. Or 
> > am
> > I missing something?  
> 
> There’s also JMP and Jcc, any of which can be used for rail calls, but those 
> are also one byte. I suppose GCC is unlikely to emit a prefixed form of any 
> of these. So maybe we really can assume they’re all one byte.
> 
> But there is a nasty potential special case: anything that takes the 
> function’s address. This includes jump tables, computed gotos, and plain old 
> function pointers. And I suspect that any of these could have one of the 
> rather large number of CALL/JMP/Jcc bytes before the relocation by 
> coincidence.
> 

FYI, your email client is horrible to read from decent email clients :-p

Anyway,

I'd like to have these "dynamic functions" be "special" where they
can't be added to tables or what not. If you need to add one to a
table or function pointer, then you need to have a wrapper function
that does the call. I think we can come up with some kind of wrapper or
linker magic to enforce this too.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Steven Rostedt

On Mon, 8 Oct 2018 09:29:56 -0700
Andy Lutomirski  wrote:

> > On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  wrote:
> > 
> > On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:  
> >>> Can't we hijack the relocation records for these functions before they
> >>> get thrown out in the (final) link pass or something?  
> >> 
> >> I could be talking out my arse here, but I thought we could do this,
> >> too, then changed my mind.  The relocation records give us the
> >> location of the call or jump operand, but they don’t give the address
> >> of the beginning of the instruction.  
> > 
> > But that's like 1 byte before the operand, right? We could even double check
> > this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> > 
> > AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> > ,
> > so if we have the PLT32 location, we also have the instruction location. Or 
> > am
> > I missing something?  
> 
> There’s also JMP and Jcc, any of which can be used for rail calls, but those 
> are also one byte. I suppose GCC is unlikely to emit a prefixed form of any 
> of these. So maybe we really can assume they’re all one byte.
> 
> But there is a nasty potential special case: anything that takes the 
> function’s address. This includes jump tables, computed gotos, and plain old 
> function pointers. And I suspect that any of these could have one of the 
> rather large number of CALL/JMP/Jcc bytes before the relocation by 
> coincidence.
> 

FYI, your email client is horrible to read from decent email clients :-p

Anyway,

I'd like to have these "dynamic functions" be "special" where they
can't be added to tables or what not. If you need to add one to a
table or function pointer, then you need to have a wrapper function
that does the call. I think we can come up with some kind of wrapper or
linker magic to enforce this too.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Steven Rostedt

On Mon, 8 Oct 2018 17:57:57 +0200
Peter Zijlstra  wrote:

> On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
> > > Can't we hijack the relocation records for these functions before they
> > > get thrown out in the (final) link pass or something?  
> > 
> > I could be talking out my arse here, but I thought we could do this,
> > too, then changed my mind.  The relocation records give us the
> > location of the call or jump operand, but they don’t give the address
> > of the beginning of the instruction.  
> 
> But that's like 1 byte before the operand, right? We could even double check
> this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> 
> AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> ,
> so if we have the PLT32 location, we also have the instruction location. Or am
> I missing something?

Yes, this is exactly what I was thinking of doing. All we need to do is
have objtool (or a modification of whatever we come up with), to find
the call sites of a specific function (we can have a table to look up
for), that creates a section listing all these call sites, and on boot
up, we can confirm that they are indeed calls (e8 operations).

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Steven Rostedt

On Mon, 8 Oct 2018 17:57:57 +0200
Peter Zijlstra  wrote:

> On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
> > > Can't we hijack the relocation records for these functions before they
> > > get thrown out in the (final) link pass or something?  
> > 
> > I could be talking out my arse here, but I thought we could do this,
> > too, then changed my mind.  The relocation records give us the
> > location of the call or jump operand, but they don’t give the address
> > of the beginning of the instruction.  
> 
> But that's like 1 byte before the operand, right? We could even double check
> this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> 
> AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> ,
> so if we have the PLT32 location, we also have the instruction location. Or am
> I missing something?

Yes, this is exactly what I was thinking of doing. All we need to do is
have objtool (or a modification of whatever we come up with), to find
the call sites of a specific function (we can have a table to look up
for), that creates a section listing all these call sites, and on boot
up, we can confirm that they are indeed calls (e8 operations).

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Andy Lutomirski

> On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  wrote:
> 
> On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
>>> Can't we hijack the relocation records for these functions before they
>>> get thrown out in the (final) link pass or something?
>> 
>> I could be talking out my arse here, but I thought we could do this,
>> too, then changed my mind.  The relocation records give us the
>> location of the call or jump operand, but they don’t give the address
>> of the beginning of the instruction.
> 
> But that's like 1 byte before the operand, right? We could even double check
> this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> 
> AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> ,
> so if we have the PLT32 location, we also have the instruction location. Or am
> I missing something?

There’s also JMP and Jcc, any of which can be used for rail calls, but those 
are also one byte. I suppose GCC is unlikely to emit a prefixed form of any of 
these. So maybe we really can assume they’re all one byte.

But there is a nasty potential special case: anything that takes the function’s 
address. This includes jump tables, computed gotos, and plain old function 
pointers. And I suspect that any of these could have one of the rather large 
number of CALL/JMP/Jcc bytes before the relocation by coincidence.

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Andy Lutomirski

> On Oct 8, 2018, at 8:57 AM, Peter Zijlstra  wrote:
> 
> On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
>>> Can't we hijack the relocation records for these functions before they
>>> get thrown out in the (final) link pass or something?
>> 
>> I could be talking out my arse here, but I thought we could do this,
>> too, then changed my mind.  The relocation records give us the
>> location of the call or jump operand, but they don’t give the address
>> of the beginning of the instruction.
> 
> But that's like 1 byte before the operand, right? We could even double check
> this by reading back that byte and ensuring it is in fact 0xE8 (CALL).
> 
> AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 
> ,
> so if we have the PLT32 location, we also have the instruction location. Or am
> I missing something?

There’s also JMP and Jcc, any of which can be used for rail calls, but those 
are also one byte. I suppose GCC is unlikely to emit a prefixed form of any of 
these. So maybe we really can assume they’re all one byte.

But there is a nasty potential special case: anything that takes the function’s 
address. This includes jump tables, computed gotos, and plain old function 
pointers. And I suspect that any of these could have one of the rather large 
number of CALL/JMP/Jcc bytes before the relocation by coincidence.

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Peter Zijlstra

On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
> > Can't we hijack the relocation records for these functions before they
> > get thrown out in the (final) link pass or something?
> 
> I could be talking out my arse here, but I thought we could do this,
> too, then changed my mind.  The relocation records give us the
> location of the call or jump operand, but they don’t give the address
> of the beginning of the instruction.

But that's like 1 byte before the operand, right? We could even double check
this by reading back that byte and ensuring it is in fact 0xE8 (CALL).

AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 ,
so if we have the PLT32 location, we also have the instruction location. Or am
I missing something?

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Peter Zijlstra

On Mon, Oct 08, 2018 at 01:33:14AM -0700, Andy Lutomirski wrote:
> > Can't we hijack the relocation records for these functions before they
> > get thrown out in the (final) link pass or something?
> 
> I could be talking out my arse here, but I thought we could do this,
> too, then changed my mind.  The relocation records give us the
> location of the call or jump operand, but they don’t give the address
> of the beginning of the instruction.

But that's like 1 byte before the operand, right? We could even double check
this by reading back that byte and ensuring it is in fact 0xE8 (CALL).

AFAICT there is only the _1_ CALL encoding, and that is the 5 byte: E8 ,
so if we have the PLT32 location, we also have the instruction location. Or am
I missing something?

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Ard Biesheuvel

On 6 October 2018 at 15:39, Steven Rostedt  wrote:
> On Sat, 6 Oct 2018 14:12:11 +0200
> Peter Zijlstra  wrote:
>
>> On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
>> > +#define arch_dynfunc_trampoline(name, def) \
>> > +   asm volatile (  \
>> > +   ".globl dynfunc_" #name "; \n\t"\
>> > +   "dynfunc_" #name ": \n\t"   \
>> > +   "jmp " #def " \n\t" \
>> > +   ".balign 8 \n \t"   \
>> > +   : : : "memory" )
>>
>> Bah, what is it with you people and trampolines. Why can't we, just like
>> jump_label, patch the call directly?
>>
>> The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
>> is slower for no real reason afaict.
>
> My first attempt was to do just that. But to add a label at the
> call site required handling all the parameters too. See my branch:
>  ftrace/jump_function-v1 for how ugly it got (and it didn't work).
>
>>
>> Steve, also see:
>>
>>   https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheu...@linaro.org
>
> Interesting. I don't have time to look at it at the moment to see what
> was done, but will do so in the near future.
>
> Remember, this was a proof of concept and even with the trampolines, it
> showed a great level of improvement. One thought was to do a
> "recordmcount.c" type of action to find where the calls were and patch
> them directly at boot up. I tried to keep the API the same where this
> could actually be done as an improvement later.
>
> Perhaps a gcc plugin might work too.
>
> I'll have to see what Ard did to handle the function parameters.
>

I didn't. My approach is just a generic function pointer which can be
overridden by the arch to be emitted as a trampoline instead.

Patching the call directly simply isn't feasible without compiler
support like we have with asm goto for jump_labels.

The use case I am proposing is providing different implementations of
crypto routines or CRC computation etc without having to rely on
function pointers, but still keep them as loadable modules. These
routines are a) heavy weight or we wouldn't bother providing
alternatives in the first place, and b) likely to have a considerable
I$ footprint already (and crypto libraries that have
encrypt/decrypt/setkey or init/update/final entry points will end up
with multiple trampolines in a single I-cache line). Also, the actual
patching only occurs on module load/unload.

I have no idea whether this reasoning applies to Steven's use case as
well, though, I haven't looked at his patches (I wasn't on cc)

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Ard Biesheuvel

On 6 October 2018 at 15:39, Steven Rostedt  wrote:
> On Sat, 6 Oct 2018 14:12:11 +0200
> Peter Zijlstra  wrote:
>
>> On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
>> > +#define arch_dynfunc_trampoline(name, def) \
>> > +   asm volatile (  \
>> > +   ".globl dynfunc_" #name "; \n\t"\
>> > +   "dynfunc_" #name ": \n\t"   \
>> > +   "jmp " #def " \n\t" \
>> > +   ".balign 8 \n \t"   \
>> > +   : : : "memory" )
>>
>> Bah, what is it with you people and trampolines. Why can't we, just like
>> jump_label, patch the call directly?
>>
>> The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
>> is slower for no real reason afaict.
>
> My first attempt was to do just that. But to add a label at the
> call site required handling all the parameters too. See my branch:
>  ftrace/jump_function-v1 for how ugly it got (and it didn't work).
>
>>
>> Steve, also see:
>>
>>   https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheu...@linaro.org
>
> Interesting. I don't have time to look at it at the moment to see what
> was done, but will do so in the near future.
>
> Remember, this was a proof of concept and even with the trampolines, it
> showed a great level of improvement. One thought was to do a
> "recordmcount.c" type of action to find where the calls were and patch
> them directly at boot up. I tried to keep the API the same where this
> could actually be done as an improvement later.
>
> Perhaps a gcc plugin might work too.
>
> I'll have to see what Ard did to handle the function parameters.
>

I didn't. My approach is just a generic function pointer which can be
overridden by the arch to be emitted as a trampoline instead.

Patching the call directly simply isn't feasible without compiler
support like we have with asm goto for jump_labels.

The use case I am proposing is providing different implementations of
crypto routines or CRC computation etc without having to rely on
function pointers, but still keep them as loadable modules. These
routines are a) heavy weight or we wouldn't bother providing
alternatives in the first place, and b) likely to have a considerable
I$ footprint already (and crypto libraries that have
encrypt/decrypt/setkey or init/update/final entry points will end up
with multiple trampolines in a single I-cache line). Also, the actual
patching only occurs on module load/unload.

I have no idea whether this reasoning applies to Steven's use case as
well, though, I haven't looked at his patches (I wasn't on cc)

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Andy Lutomirski

> On Oct 8, 2018, at 12:21 AM, Peter Zijlstra  wrote:
> 
>> On Sat, Oct 06, 2018 at 09:39:05AM -0400, Steven Rostedt wrote:
>> On Sat, 6 Oct 2018 14:12:11 +0200
>> Peter Zijlstra  wrote:
>> 
 On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
 +#define arch_dynfunc_trampoline(name, def)\
 +asm volatile (\
 +".globl dynfunc_" #name "; \n\t"\
 +"dynfunc_" #name ": \n\t"\
 +"jmp " #def " \n\t"\
 +".balign 8 \n \t"\
 +: : : "memory" )  
>>> 
>>> Bah, what is it with you people and trampolines. Why can't we, just like
>>> jump_label, patch the call directly?
>>> 
>>> The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
>>> is slower for no real reason afaict.
>> 
>> My first attempt was to do just that. But to add a label at the
>> call site required handling all the parameters too. See my branch:
>> ftrace/jump_function-v1 for how ugly it got (and it didn't work).
> 
> Can't we hijack the relocation records for these functions before they
> get thrown out in the (final) link pass or something?

I could be talking out my arse here, but I thought we could do this, too, then 
changed my mind.  The relocation records give us the location of the call or 
jump operand, but they don’t give the address of the beginning of the 
instruction. If the instruction crosses a cache line boundary, don’t we need to 
use the int3 patching trick?  And that requires knowing which byte to patch 
with int3.

Or am I wrong and can the CPUs we care about correctly handle a locked write to 
part of an instruction that crosses a cache line boundary?

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Andy Lutomirski

> On Oct 8, 2018, at 12:21 AM, Peter Zijlstra  wrote:
> 
>> On Sat, Oct 06, 2018 at 09:39:05AM -0400, Steven Rostedt wrote:
>> On Sat, 6 Oct 2018 14:12:11 +0200
>> Peter Zijlstra  wrote:
>> 
 On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
 +#define arch_dynfunc_trampoline(name, def)\
 +asm volatile (\
 +".globl dynfunc_" #name "; \n\t"\
 +"dynfunc_" #name ": \n\t"\
 +"jmp " #def " \n\t"\
 +".balign 8 \n \t"\
 +: : : "memory" )  
>>> 
>>> Bah, what is it with you people and trampolines. Why can't we, just like
>>> jump_label, patch the call directly?
>>> 
>>> The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
>>> is slower for no real reason afaict.
>> 
>> My first attempt was to do just that. But to add a label at the
>> call site required handling all the parameters too. See my branch:
>> ftrace/jump_function-v1 for how ugly it got (and it didn't work).
> 
> Can't we hijack the relocation records for these functions before they
> get thrown out in the (final) link pass or something?

I could be talking out my arse here, but I thought we could do this, too, then 
changed my mind.  The relocation records give us the location of the call or 
jump operand, but they don’t give the address of the beginning of the 
instruction. If the instruction crosses a cache line boundary, don’t we need to 
use the int3 patching trick?  And that requires knowing which byte to patch 
with int3.

Or am I wrong and can the CPUs we care about correctly handle a locked write to 
part of an instruction that crosses a cache line boundary?

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Peter Zijlstra

On Sat, Oct 06, 2018 at 09:39:05AM -0400, Steven Rostedt wrote:
> On Sat, 6 Oct 2018 14:12:11 +0200
> Peter Zijlstra  wrote:
> 
> > On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
> > > +#define arch_dynfunc_trampoline(name, def)   \
> > > + asm volatile (  \
> > > + ".globl dynfunc_" #name "; \n\t"\
> > > + "dynfunc_" #name ": \n\t"   \
> > > + "jmp " #def " \n\t" \
> > > + ".balign 8 \n \t"   \
> > > + : : : "memory" )  
> > 
> > Bah, what is it with you people and trampolines. Why can't we, just like
> > jump_label, patch the call directly?
> > 
> > The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
> > is slower for no real reason afaict.
> 
> My first attempt was to do just that. But to add a label at the
> call site required handling all the parameters too. See my branch:
>  ftrace/jump_function-v1 for how ugly it got (and it didn't work).

Can't we hijack the relocation records for these functions before they
get thrown out in the (final) link pass or something?

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-08 Thread Peter Zijlstra

On Sat, Oct 06, 2018 at 09:39:05AM -0400, Steven Rostedt wrote:
> On Sat, 6 Oct 2018 14:12:11 +0200
> Peter Zijlstra  wrote:
> 
> > On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
> > > +#define arch_dynfunc_trampoline(name, def)   \
> > > + asm volatile (  \
> > > + ".globl dynfunc_" #name "; \n\t"\
> > > + "dynfunc_" #name ": \n\t"   \
> > > + "jmp " #def " \n\t" \
> > > + ".balign 8 \n \t"   \
> > > + : : : "memory" )  
> > 
> > Bah, what is it with you people and trampolines. Why can't we, just like
> > jump_label, patch the call directly?
> > 
> > The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
> > is slower for no real reason afaict.
> 
> My first attempt was to do just that. But to add a label at the
> call site required handling all the parameters too. See my branch:
>  ftrace/jump_function-v1 for how ugly it got (and it didn't work).

Can't we hijack the relocation records for these functions before they
get thrown out in the (final) link pass or something?

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-06 Thread Steven Rostedt

On Sat, 6 Oct 2018 08:13:18 -0700
Andy Lutomirski  wrote:

> > Perhaps a gcc plugin might work too.
> >   
> 
> My suggestion was to have objtool do the dirty work. Josh said something 
> suspiciously like “sounds fun” on IRC :)
> 

objtool does basically the same thing as recordmcount does. Josh and I
have both said that it's on our todo list to combine the two and make it
more generic for operations like this.

Seems like now's the time to do it.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-06 Thread Steven Rostedt

On Sat, 6 Oct 2018 08:13:18 -0700
Andy Lutomirski  wrote:

> > Perhaps a gcc plugin might work too.
> >   
> 
> My suggestion was to have objtool do the dirty work. Josh said something 
> suspiciously like “sounds fun” on IRC :)
> 

objtool does basically the same thing as recordmcount does. Josh and I
have both said that it's on our todo list to combine the two and make it
more generic for operations like this.

Seems like now's the time to do it.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-06 Thread Steven Rostedt

On Fri, 5 Oct 2018 22:03:51 -0400
Steven Rostedt  wrote:

> On Fri, 05 Oct 2018 21:51:11 -0400
> Steven Rostedt  wrote:
> 
> > +#ifndef PARAMS
> > +#define PARAMS(x...) x
> > +#endif
> > +
> > +#ifndef ARGS
> > +#define ARGS(x...) x
> > +#endif
> > +  
> 
> This is also leftover from the first attempt and can be nuked.
>

I take this back. It is still needed.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-06 Thread Steven Rostedt

On Fri, 5 Oct 2018 22:03:51 -0400
Steven Rostedt  wrote:

> On Fri, 05 Oct 2018 21:51:11 -0400
> Steven Rostedt  wrote:
> 
> > +#ifndef PARAMS
> > +#define PARAMS(x...) x
> > +#endif
> > +
> > +#ifndef ARGS
> > +#define ARGS(x...) x
> > +#endif
> > +  
> 
> This is also leftover from the first attempt and can be nuked.
>

I take this back. It is still needed.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-06 Thread Andy Lutomirski




> On Oct 6, 2018, at 6:39 AM, Steven Rostedt  wrote:
> 
> On Sat, 6 Oct 2018 14:12:11 +0200
> Peter Zijlstra  wrote:
> 
>>> On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
>>> +#define arch_dynfunc_trampoline(name, def)\
>>> +asm volatile (\
>>> +".globl dynfunc_" #name "; \n\t"\
>>> +"dynfunc_" #name ": \n\t"\
>>> +"jmp " #def " \n\t"\
>>> +".balign 8 \n \t"\
>>> +: : : "memory" )  
>> 
>> Bah, what is it with you people and trampolines. Why can't we, just like
>> jump_label, patch the call directly?
>> 
>> The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
>> is slower for no real reason afaict.
> 
> My first attempt was to do just that. But to add a label at the
> call site required handling all the parameters too. See my branch:
> ftrace/jump_function-v1 for how ugly it got (and it didn't work).
> 
>> 
>> Steve, also see:
>> 
>>  https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheu...@linaro.org
> 
> Interesting. I don't have time to look at it at the moment to see what
> was done, but will do so in the near future.
> 
> Remember, this was a proof of concept and even with the trampolines, it
> showed a great level of improvement. One thought was to do a
> "recordmcount.c" type of action to find where the calls were and patch
> them directly at boot up. I tried to keep the API the same where this
> could actually be done as an improvement later.
> 
> Perhaps a gcc plugin might work too.
> 

My suggestion was to have objtool do the dirty work. Josh said something 
suspiciously like “sounds fun” on IRC :)

> I'll have to see what Ard did to handle the function parameters.
> 
> -- Steve
>

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-06 Thread Andy Lutomirski




> On Oct 6, 2018, at 6:39 AM, Steven Rostedt  wrote:
> 
> On Sat, 6 Oct 2018 14:12:11 +0200
> Peter Zijlstra  wrote:
> 
>>> On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
>>> +#define arch_dynfunc_trampoline(name, def)\
>>> +asm volatile (\
>>> +".globl dynfunc_" #name "; \n\t"\
>>> +"dynfunc_" #name ": \n\t"\
>>> +"jmp " #def " \n\t"\
>>> +".balign 8 \n \t"\
>>> +: : : "memory" )  
>> 
>> Bah, what is it with you people and trampolines. Why can't we, just like
>> jump_label, patch the call directly?
>> 
>> The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
>> is slower for no real reason afaict.
> 
> My first attempt was to do just that. But to add a label at the
> call site required handling all the parameters too. See my branch:
> ftrace/jump_function-v1 for how ugly it got (and it didn't work).
> 
>> 
>> Steve, also see:
>> 
>>  https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheu...@linaro.org
> 
> Interesting. I don't have time to look at it at the moment to see what
> was done, but will do so in the near future.
> 
> Remember, this was a proof of concept and even with the trampolines, it
> showed a great level of improvement. One thought was to do a
> "recordmcount.c" type of action to find where the calls were and patch
> them directly at boot up. I tried to keep the API the same where this
> could actually be done as an improvement later.
> 
> Perhaps a gcc plugin might work too.
> 

My suggestion was to have objtool do the dirty work. Josh said something 
suspiciously like “sounds fun” on IRC :)

> I'll have to see what Ard did to handle the function parameters.
> 
> -- Steve
>

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-06 Thread Steven Rostedt

On Sat, 6 Oct 2018 14:12:11 +0200
Peter Zijlstra  wrote:

> On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
> > +#define arch_dynfunc_trampoline(name, def) \
> > +   asm volatile (  \
> > +   ".globl dynfunc_" #name "; \n\t"\
> > +   "dynfunc_" #name ": \n\t"   \
> > +   "jmp " #def " \n\t" \
> > +   ".balign 8 \n \t"   \
> > +   : : : "memory" )  
> 
> Bah, what is it with you people and trampolines. Why can't we, just like
> jump_label, patch the call directly?
> 
> The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
> is slower for no real reason afaict.

My first attempt was to do just that. But to add a label at the
call site required handling all the parameters too. See my branch:
 ftrace/jump_function-v1 for how ugly it got (and it didn't work).

> 
> Steve, also see:
> 
>   https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheu...@linaro.org

Interesting. I don't have time to look at it at the moment to see what
was done, but will do so in the near future.

Remember, this was a proof of concept and even with the trampolines, it
showed a great level of improvement. One thought was to do a
"recordmcount.c" type of action to find where the calls were and patch
them directly at boot up. I tried to keep the API the same where this
could actually be done as an improvement later.

Perhaps a gcc plugin might work too.

I'll have to see what Ard did to handle the function parameters.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-06 Thread Steven Rostedt

On Sat, 6 Oct 2018 14:12:11 +0200
Peter Zijlstra  wrote:

> On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
> > +#define arch_dynfunc_trampoline(name, def) \
> > +   asm volatile (  \
> > +   ".globl dynfunc_" #name "; \n\t"\
> > +   "dynfunc_" #name ": \n\t"   \
> > +   "jmp " #def " \n\t" \
> > +   ".balign 8 \n \t"   \
> > +   : : : "memory" )  
> 
> Bah, what is it with you people and trampolines. Why can't we, just like
> jump_label, patch the call directly?
> 
> The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
> is slower for no real reason afaict.

My first attempt was to do just that. But to add a label at the
call site required handling all the parameters too. See my branch:
 ftrace/jump_function-v1 for how ugly it got (and it didn't work).

> 
> Steve, also see:
> 
>   https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheu...@linaro.org

Interesting. I don't have time to look at it at the moment to see what
was done, but will do so in the near future.

Remember, this was a proof of concept and even with the trampolines, it
showed a great level of improvement. One thought was to do a
"recordmcount.c" type of action to find where the calls were and patch
them directly at boot up. I tried to keep the API the same where this
could actually be done as an improvement later.

Perhaps a gcc plugin might work too.

I'll have to see what Ard did to handle the function parameters.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-06 Thread Peter Zijlstra

On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
> +#define arch_dynfunc_trampoline(name, def)   \
> + asm volatile (  \
> + ".globl dynfunc_" #name "; \n\t"\
> + "dynfunc_" #name ": \n\t"   \
> + "jmp " #def " \n\t" \
> + ".balign 8 \n \t"   \
> + : : : "memory" )

Bah, what is it with you people and trampolines. Why can't we, just like
jump_label, patch the call directly?

The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
is slower for no real reason afaict.

Steve, also see:

  https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheu...@linaro.org

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-06 Thread Peter Zijlstra

On Fri, Oct 05, 2018 at 09:51:11PM -0400, Steven Rostedt wrote:
> +#define arch_dynfunc_trampoline(name, def)   \
> + asm volatile (  \
> + ".globl dynfunc_" #name "; \n\t"\
> + "dynfunc_" #name ": \n\t"   \
> + "jmp " #def " \n\t" \
> + ".balign 8 \n \t"   \
> + : : : "memory" )

Bah, what is it with you people and trampolines. Why can't we, just like
jump_label, patch the call directly?

The whole call+jmp thing is silly, don't do that. It just wrecks I$ and
is slower for no real reason afaict.

Steve, also see:

  https://lkml.kernel.org/r/20181005081333.15018-1-ard.biesheu...@linaro.org

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> +#ifndef PARAMS
> +#define PARAMS(x...) x
> +#endif
> +
> +#ifndef ARGS
> +#define ARGS(x...) x
> +#endif
> +

This is also leftover from the first attempt and can be nuked.

Yeah, yeah, I should have reviewed my patches better before sending
them. But I was so excited that I got it working I just wanted to share
the joy!

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> +#ifndef PARAMS
> +#define PARAMS(x...) x
> +#endif
> +
> +#ifndef ARGS
> +#define ARGS(x...) x
> +#endif
> +

This is also leftover from the first attempt and can be nuked.

Yeah, yeah, I should have reviewed my patches better before sending
them. But I was so excited that I got it working I just wanted to share
the joy!

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> +#define arch_dynfunc_trampoline(name, def)   \
> + asm volatile (  \
> + ".globl dynfunc_" #name "; \n\t"\
> + "dynfunc_" #name ": \n\t"   \
> + "jmp " #def " \n\t" \
> + ".balign 8 \n \t"   \
> + : : : "memory" )
> +

Note, the assembler can easily put in a two byte jump here. The .balign
was suppose to also have some padding (nop) incase that happens. It's
fine, because we can just replace it with a 5 byte jump, as long as we
have 3 bytes afterward if it is a two byte jump.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> +#define arch_dynfunc_trampoline(name, def)   \
> + asm volatile (  \
> + ".globl dynfunc_" #name "; \n\t"\
> + "dynfunc_" #name ": \n\t"   \
> + "jmp " #def " \n\t" \
> + ".balign 8 \n \t"   \
> + : : : "memory" )
> +

Note, the assembler can easily put in a two byte jump here. The .balign
was suppose to also have some padding (nop) incase that happens. It's
fine, because we can just replace it with a 5 byte jump, as long as we
have 3 bytes afterward if it is a two byte jump.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> From: "Steven Rostedt (VMware)" 
> 
> Signed-off-by: Steven Rostedt (VMware) 
> ---
>  include/asm-generic/vmlinux.lds.h |   4 +
>  include/linux/jump_function.h |  93 
>  kernel/Makefile   |   2 +-
>  kernel/jump_function.c| 368 ++
>  4 files changed, 466 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/jump_function.h
>  create mode 100644 kernel/jump_function.c
> 
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index 7b75ff6e2fce..0e205069ff36 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -257,6 +257,10 @@
>   __start___jump_table = .;   \
>   KEEP(*(__jump_table))   \
>   __stop___jump_table = .;\
> + . = ALIGN(8);   \
> + __start___dynfunc_table = .;\
> + KEEP(*(__dynfunc_table))\
> + __stop___dynfunc_table = .; \
>   . = ALIGN(8);   \
>   __start___verbose = .;  \
>   KEEP(*(__verbose))  \
>

BAH, this is leftover from my first attempt. It's not needed and can be
nuked.

-- Steve

Re: [POC][RFC][PATCH 1/2] jump_function: Addition of new feature "jump_function"

2018-10-05 Thread Steven Rostedt

On Fri, 05 Oct 2018 21:51:11 -0400
Steven Rostedt  wrote:

> From: "Steven Rostedt (VMware)" 
> 
> Signed-off-by: Steven Rostedt (VMware) 
> ---
>  include/asm-generic/vmlinux.lds.h |   4 +
>  include/linux/jump_function.h |  93 
>  kernel/Makefile   |   2 +-
>  kernel/jump_function.c| 368 ++
>  4 files changed, 466 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/jump_function.h
>  create mode 100644 kernel/jump_function.c
> 
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index 7b75ff6e2fce..0e205069ff36 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -257,6 +257,10 @@
>   __start___jump_table = .;   \
>   KEEP(*(__jump_table))   \
>   __stop___jump_table = .;\
> + . = ALIGN(8);   \
> + __start___dynfunc_table = .;\
> + KEEP(*(__dynfunc_table))\
> + __stop___dynfunc_table = .; \
>   . = ALIGN(8);   \
>   __start___verbose = .;  \
>   KEEP(*(__verbose))  \
>

BAH, this is leftover from my first attempt. It's not needed and can be
nuked.

-- Steve

80 matches

Mail list logo