Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-23 Thread Steven Rostedt
On Tue, 22 Oct 2019 22:24:01 +0200
Peter Zijlstra  wrote:

> On Mon, Oct 21, 2019 at 10:21:10PM -0400, Steven Rostedt wrote:
> > On Fri, 18 Oct 2019 09:35:40 +0200
> > Peter Zijlstra  wrote:
> >   
> > > Now that set_all_modules_text_*() is gone, nothing depends on the
> > > relation between ->state = COMING and the protection state anymore.
> > > This enables moving the protection changes later, such that the COMING
> > > notifier callbacks can more easily modify the text.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) 
> > > Cc: Jessica Yu 
> > > ---  
> > 
> > This triggered the following bug:
> >   
> 
> > The trace_event_define_fields_() is defined in
> > include/trace/trace_events.h and is an init function called by the
> > trace_events event_create_dir() via the module notifier:
> > MODULE_STATE_COMING  
> 
> The below seems to cure it; and seems to generate identical
> events/*/format output (for my .config, with the exception of ID).
> 
> It has just one section mismatch report that I'm too tired to look at
> just now.
> 
> I'm not particularly proud of the "__function__" hack, but it works :/ I
> couldn't come up with anything else for [uk]probes which seem to have
> dynamic fields and if we're having it then syscall_enter can also make
> use of it, the syscall_metadata crud was going to be ugly otherwise.
> 
> (also, win on LOC)
> 
>

After applying this series and this patch I triggered this:

[ 1397.281889] BUG: kernel NULL pointer dereference, address: 0001
[ 1397.288896] #PF: supervisor read access in kernel mode
[ 1397.294062] #PF: error_code(0x) - not-present page
[ 1397.299192] PGD 0 P4D 0 
[ 1397.301728] Oops:  [#1] PREEMPT SMP PTI
[ 1397.305908] CPU: 7 PID: 4252 Comm: ftracetest Not tainted 5.4.0-rc3-test+ 
#132
[ 1397.313114] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS 
K01 v03.03 07/14/2016
[ 1397.322056] RIP: 0010:event_create_dir+0x26a/0x520
[ 1397.326841] Code: ff ff 5a 85 c0 75 37 44 03 7b 10 48 83 c3 20 4c 8b 13 4d 
85 d2 0f 84 66 fe ff ff b9 0d 00 00 00 4c 89 d6 4c 89 f7 48 8b 53 08  a6 0f 
97 c1 80 d9 00 84 c9 75 a5 48 89 ef e8 b2 d4 a3 00 85 c0
[ 1397.345558] RSP: 0018:c9a63d18 EFLAGS: 00010202
[ 1397.350775] RAX:  RBX: c9a63d80 RCX: 000d
[ 1397.357893] RDX: 811ccfac RSI: 0001 RDI: 8207c68a
[ 1397.365006] RBP: 888114b1c750 R08:  R09: 8881147561b0
[ 1397.372119] R10: 0001 R11: 0001 R12: 8880b1d80528
[ 1397.379243] R13: 88811189aeb0 R14: 8207c68a R15: 811d2d22
[ 1397.386365] FS:  7f2567213740() GS:88811a9c() 
knlGS:
[ 1397.394437] CS:  0010 DS:  ES:  CR0: 80050033
[ 1397.400174] CR2: 0001 CR3: b1f06005 CR4: 001606e0
[ 1397.407297] Call Trace:
[ 1397.409753]  trace_add_event_call+0x6c/0xb0
[ 1397.413938]  trace_probe_register_event_call+0x22/0x50
[ 1397.419071]  trace_kprobe_create+0x65c/0xa20
[ 1397.423340]  ? argv_split+0x99/0x130
[ 1397.426913]  ? __kmalloc+0x1d4/0x2c0
[ 1397.430485]  ? trace_kprobe_create+0xa20/0xa20
[ 1397.434922]  ? trace_kprobe_create+0xa20/0xa20
[ 1397.439361]  create_or_delete_trace_kprobe+0xd/0x30
[ 1397.444237]  trace_run_command+0x72/0x90
[ 1397.448158]  trace_parse_run_command+0xaf/0x131
[ 1397.452684]  vfs_write+0xa5/0x1a0
[ 1397.455996]  ksys_write+0x5c/0xd0
[ 1397.459312]  do_syscall_64+0x48/0x120
[ 1397.462971]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1397.468017] RIP: 0033:0x7f2567303ff8

By running tools/selftests/ftrace/ftracetest

Crashed here:

[33] Kprobe dynamic event - adding and removing [PASS]
[34] Kprobe dynamic event - busy event check[PASS]
[35] Kprobe event with comm arguments   [FAIL]
[36] Kprobe event string type argumentclient_loop:

-- Steve


Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-23 Thread Josh Poimboeuf
On Wed, Oct 23, 2019 at 05:16:54PM +0200, Peter Zijlstra wrote:
> @@ -157,6 +158,14 @@ static int __apply_relocate_add(Elf64_Sh
>  
>   val = sym->st_value + rel[i].r_addend;
>  
> + /*
> +  * .klp.rela.* sections should only contain module
> +  * related RELAs. All core-kernel RELAs should be in
> +  * normal .rela.* sections and be applied when loading
> +  * the patch module itself.
> +  */
> + WARN_ON_ONCE(klp && core_kernel_text(val));
> +

This isn't quite true, we also use .klp.rela sections to access
unexported vmlinux symbols.

-- 
Josh



Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-23 Thread Josh Poimboeuf
On Wed, Oct 23, 2019 at 01:48:35PM +0200, Peter Zijlstra wrote:
> Now sadly that commit missed all the useful information, luckily I could
> find the patch in my LKML folder, more sad, that thread still didn't
> contain the actual useful information, for that I was directed to
> github:
> 
>   https://github.com/dynup/kpatch/issues/580
> 
> Now, someone is owning me a beer for having to look at github for this.

Deal.  And you probably deserve a few more for fixing our crap.

The github thing is supposed to be temporary, at least in theory we'll
eventually have all klp patch module building code in the kernel tree.

> That finally explained that what happens is that the RELA was trying to
> fix up the paravirt indirect call to 'local_irq_disable', which
> apply_paravirt() will have overwritten with 'CLI; NOP'. This then
> obviously goes *bang*.
> 
> This then raises a number of questions:
> 
>  1) why is that RELA (that obviously does not depend on any module)
> applied so late?

Good question.  The 'pv_ops' symbol is exported by the core kernel, so I
can't see any reason why we'd need to apply that rela late.  In theory,
kpatch-build isn't supposed to convert that to a klp rela.  Maybe
something went wrong in the patch creation code.

I'm also questioning why we even need to apply the parainstructions
section late.  Maybe we can remove that apply_paravirt() call
altogether, along with .klp.arch.parainstruction sections.

I'll need to look into it...

>  2) why can't we unconditionally skip RELA's to paravirt sites?

We could, but I don't think it's needed if we fix #1.

>  3) Is there ever a possible module-dependent RELA to a paravirt /
> alternative site?

Good question...

> Now, for 1), I would propose '.klp.rela.${mod}' sections only contain
> RELAs that depend on symbols in ${mod} (or modules in general).

That was already the goal, but we've apparently failed at that.

> We can fix up RELAs that depend on core kernel early without problems.
> Let them be in the normal .rela sections and be fixed up on loading
> the patch-module as per usual.

If such symbols aren't exported, then they still need to be in
.klp.rela.vmlinux sections, since normal relas won't work.

> This should also deal with 2, paravirt should always have RELAs into the
> core kernel.
> 
> Then for 3) we only have alternatives left, and I _think_ it unlikely to
> be the case, but I'll have to have a hard look at that.

I'm not sure about alternatives, but maybe we can enforce such
limitations with tooling and/or kernel checks.

-- 
Josh



Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-23 Thread Peter Zijlstra
On Wed, Oct 23, 2019 at 01:48:35PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote:
> 
> > > Doesn't livepatch code also need to be modified?  We have:
> > 
> > Urgh bah.. I was too focussed on the other klp borkage :/ But yes,
> > arm64/ftrace and klp are the only two users of that function (outside of
> > module.c) and Mark was already writing a patch for arm64.
> > 
> > Means these last two patches need to wait a little until we've fixed
> > those.
> 
> So the other KLP issue:
> 
>  peterz: ad klp, apply_relocate_add() and text_poke()... what
>  about apply_alternatives() and apply_paravirt()? They call
>  text_poke_early(), which was ok with module_disable/enable_ro(), but
>now it is not, I suppose. See arch_klp_init_object_loaded() in
>  arch/x86/kernel/livepatch.c.
> 
>  mbenes: hurm, I don't see why we would need to do
>  apply_alternatives() / apply_paravirt() later, why can't we do that
>the moment we load the module
> 
>  mbenes: that is, those things _never_ change after boot
> 
>  peterz: as jpoimboe explained. See commit
>  d4c3e6e1b193497da3a2ce495fb1db0243e41e37 for detailed explanation.
> 
> Now sadly that commit missed all the useful information, luckily I could
> find the patch in my LKML folder, more sad, that thread still didn't
> contain the actual useful information, for that I was directed to
> github:
> 
>   https://github.com/dynup/kpatch/issues/580
> 
> Now, someone is owning me a beer for having to look at github for this.
> 
> That finally explained that what happens is that the RELA was trying to
> fix up the paravirt indirect call to 'local_irq_disable', which
> apply_paravirt() will have overwritten with 'CLI; NOP'. This then
> obviously goes *bang*.
> 
> This then raises a number of questions:
> 
>  1) why is that RELA (that obviously does not depend on any module)
> applied so late?
> 
>  2) why can't we unconditionally skip RELA's to paravirt sites?
> 
>  3) Is there ever a possible module-dependent RELA to a paravirt /
> alternative site?
> 
> 
> Now, for 1), I would propose '.klp.rela.${mod}' sections only contain
> RELAs that depend on symbols in ${mod} (or modules in general). We can
> fix up RELAs that depend on core kernel early without problems. Let them
> be in the normal .rela sections and be fixed up on loading the
> patch-module as per usual.
> 
> This should also deal with 2, paravirt should always have RELAs into the
> core kernel.
> 
> Then for 3) we only have alternatives left, and I _think_ it unlikely to
> be the case, but I'll have to have a hard look at that.
> 
> Hmmm ?

Something like so on top, I suppose.

---

--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -131,7 +131,8 @@ static int __apply_relocate_add(Elf64_Sh
   unsigned int symindex,
   unsigned int relsec,
   struct module *me,
-  void *(*write)(void *addr, const void *val, size_t size))
+  void *(*write)(void *addr, const void *val, size_t size),
+  bool klp)
 {
unsigned int i;
Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
@@ -157,6 +158,14 @@ static int __apply_relocate_add(Elf64_Sh
 
val = sym->st_value + rel[i].r_addend;
 
+   /*
+* .klp.rela.* sections should only contain module
+* related RELAs. All core-kernel RELAs should be in
+* normal .rela.* sections and be applied when loading
+* the patch module itself.
+*/
+   WARN_ON_ONCE(klp && core_kernel_text(val));
+
switch (ELF64_R_TYPE(rel[i].r_info)) {
case R_X86_64_NONE:
break;
@@ -223,7 +232,7 @@ int apply_relocate_add(Elf64_Shdr *sechd
   unsigned int relsec,
   struct module *me)
 {
-   return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, 
memcpy);
+   return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, 
memcpy, false);
 }
 
 int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
@@ -234,7 +243,7 @@ int klp_apply_relocate_add(Elf64_Shdr *s
 {
int ret;
 
-   ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, 
text_poke);
+   ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, 
text_poke, true);
if (!ret)
text_poke_sync();
 


Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-23 Thread Peter Zijlstra
On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote:

> > Doesn't livepatch code also need to be modified?  We have:
> 
> Urgh bah.. I was too focussed on the other klp borkage :/ But yes,
> arm64/ftrace and klp are the only two users of that function (outside of
> module.c) and Mark was already writing a patch for arm64.
> 
> Means these last two patches need to wait a little until we've fixed
> those.

So the other KLP issue:

 peterz: ad klp, apply_relocate_add() and text_poke()... what
 about apply_alternatives() and apply_paravirt()? They call
 text_poke_early(), which was ok with module_disable/enable_ro(), but
 now it is not, I suppose. See arch_klp_init_object_loaded() in
 arch/x86/kernel/livepatch.c.

 mbenes: hurm, I don't see why we would need to do
 apply_alternatives() / apply_paravirt() later, why can't we do that
 the moment we load the module

 mbenes: that is, those things _never_ change after boot

 peterz: as jpoimboe explained. See commit
 d4c3e6e1b193497da3a2ce495fb1db0243e41e37 for detailed explanation.

Now sadly that commit missed all the useful information, luckily I could
find the patch in my LKML folder, more sad, that thread still didn't
contain the actual useful information, for that I was directed to
github:

  https://github.com/dynup/kpatch/issues/580

Now, someone is owning me a beer for having to look at github for this.

That finally explained that what happens is that the RELA was trying to
fix up the paravirt indirect call to 'local_irq_disable', which
apply_paravirt() will have overwritten with 'CLI; NOP'. This then
obviously goes *bang*.

This then raises a number of questions:

 1) why is that RELA (that obviously does not depend on any module)
applied so late?

 2) why can't we unconditionally skip RELA's to paravirt sites?

 3) Is there ever a possible module-dependent RELA to a paravirt /
alternative site?


Now, for 1), I would propose '.klp.rela.${mod}' sections only contain
RELAs that depend on symbols in ${mod} (or modules in general). We can
fix up RELAs that depend on core kernel early without problems. Let them
be in the normal .rela sections and be fixed up on loading the
patch-module as per usual.

This should also deal with 2, paravirt should always have RELAs into the
core kernel.

Then for 3) we only have alternatives left, and I _think_ it unlikely to
be the case, but I'll have to have a hard look at that.

Hmmm ?


Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-23 Thread Peter Zijlstra
On Tue, Oct 22, 2019 at 04:40:23PM -0400, Steven Rostedt wrote:
> On Tue, 22 Oct 2019 22:24:01 +0200
> Peter Zijlstra  wrote:

> > I'm not particularly proud of the "__function__" hack, but it works :/ I
> 
> If anything, that should be defined as a macro:
> 
> #define TRACE_EVENT_FIELD_SPECIAL "__trace_event_special__"
> 
> And use that to test?

Possibly, also, we should probably start with a character that C doesn't
allow in typenames, like '$'.

That way we can have a much shorter string and still be certain it will
never conflict; "$func" comes to mind.

> > couldn't come up with anything else for [uk]probes which seem to have
> > dynamic fields and if we're having it then syscall_enter can also make
> > use of it, the syscall_metadata crud was going to be ugly otherwise.
> > 
> > (also, win on LOC)
> 
> I'm more worried about text/data bloat. But if anything, we may be able
> to deal with that later.

We use almost the exact same data the function would've used, except we
don't have the actual function. I just don't see how it can be more.


Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-22 Thread Steven Rostedt
On Tue, 22 Oct 2019 22:24:01 +0200
Peter Zijlstra  wrote:

> The below seems to cure it; and seems to generate identical
> events/*/format output (for my .config, with the exception of ID).
> 
> It has just one section mismatch report that I'm too tired to look at
> just now.

Thanks, I'll try to take a look at it tomorrow.

> 
> I'm not particularly proud of the "__function__" hack, but it works :/ I

If anything, that should be defined as a macro:

#define TRACE_EVENT_FIELD_SPECIAL "__trace_event_special__"

And use that to test?

> couldn't come up with anything else for [uk]probes which seem to have
> dynamic fields and if we're having it then syscall_enter can also make
> use of it, the syscall_metadata crud was going to be ugly otherwise.
> 
> (also, win on LOC)

I'm more worried about text/data bloat. But if anything, we may be able
to deal with that later.

-- Steve


Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-22 Thread Peter Zijlstra
On Mon, Oct 21, 2019 at 10:21:10PM -0400, Steven Rostedt wrote:
> On Fri, 18 Oct 2019 09:35:40 +0200
> Peter Zijlstra  wrote:
> 
> > Now that set_all_modules_text_*() is gone, nothing depends on the
> > relation between ->state = COMING and the protection state anymore.
> > This enables moving the protection changes later, such that the COMING
> > notifier callbacks can more easily modify the text.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > Cc: Jessica Yu 
> > ---
> 
> This triggered the following bug:
> 

> The trace_event_define_fields_() is defined in
> include/trace/trace_events.h and is an init function called by the
> trace_events event_create_dir() via the module notifier:
> MODULE_STATE_COMING

The below seems to cure it; and seems to generate identical
events/*/format output (for my .config, with the exception of ID).

It has just one section mismatch report that I'm too tired to look at
just now.

I'm not particularly proud of the "__function__" hack, but it works :/ I
couldn't come up with anything else for [uk]probes which seem to have
dynamic fields and if we're having it then syscall_enter can also make
use of it, the syscall_metadata crud was going to be ugly otherwise.

(also, win on LOC)

---
 fs/xfs/xfs_trace.h |   4 +-
 include/linux/trace_events.h   |  16 ++-
 include/trace/events/filemap.h |   2 +-
 include/trace/trace_events.h   |  64 -
 kernel/trace/trace.h   |  31 ++--
 kernel/trace/trace_entries.h   |  66 +++--
 kernel/trace/trace_events.c|  20 +++-
 kernel/trace/trace_export.c| 106 +++--
 kernel/trace/trace_kprobe.c|  12 -
 kernel/trace/trace_syscalls.c  |  48 +++
 kernel/trace/trace_uprobe.c|   6 ++-
 11 files changed, 162 insertions(+), 213 deletions(-)

diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index eaae275ed430..53c5485cf2a1 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -218,8 +218,8 @@ DECLARE_EVENT_CLASS(xfs_bmap_class,
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_ino_t, ino)
-   __field(void *, leaf);
-   __field(int, pos);
+   __field(void *, leaf)
+   __field(int, pos)
__field(xfs_fileoff_t, startoff)
__field(xfs_fsblock_t, startblock)
__field(xfs_filblks_t, blockcount)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 30a8cdcfd4a4..30782e78b91d 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -187,6 +187,20 @@ enum trace_reg {
 
 struct trace_event_call;
 
+struct trace_event_fields {
+   const char *type;
+   union {
+   struct {
+   const char *name;
+   const int  size;
+   const int  align;
+   const int  is_signed;
+   const int  filter_type;
+   };
+   int (*define_fields)(struct trace_event_call *);
+   };
+};
+
 struct trace_event_class {
const char  *system;
void*probe;
@@ -195,7 +209,7 @@ struct trace_event_class {
 #endif
int (*reg)(struct trace_event_call *event,
   enum trace_reg type, void *data);
-   int (*define_fields)(struct trace_event_call *);
+   struct trace_event_fields *fields_array;
struct list_head*(*get_fields)(struct trace_event_call *);
struct list_headfields;
int (*raw_init)(struct trace_event_call *);
diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index ee05db7ee8d2..796053e162d2 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -85,7 +85,7 @@ TRACE_EVENT(file_check_and_advance_wb_err,
TP_ARGS(file, old),
 
TP_STRUCT__entry(
-   __field(struct file *, file);
+   __field(struct file *, file)
__field(unsigned long, i_ino)
__field(dev_t, s_dev)
__field(errseq_t, old)
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 4ecdfe2e3580..ca1d2e745a3f 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -394,22 +394,16 @@ static struct trace_event_functions 
trace_event_type_funcs_##call = { \
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
 
 #undef __field_ext
-#define __field_ext(type, item, filter_type)   \
-   ret = trace_define_field(event_call, #type, #item,  \
-offsetof(typeof(field), item), \
-sizeof(field.item),\
-  

Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-22 Thread Peter Zijlstra
On Tue, Oct 22, 2019 at 01:31:16PM +0200, Heiko Carstens wrote:
> On Mon, Oct 21, 2019 at 06:11:35PM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote:
> > > On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> > 
> > > So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
> > > available on Power and x86, and Power does not have STRICT_MODULE_RWX,
> > > the below should be sufficient.
> > > 
> > > Completely untested...
> > 
> > And because s390 also has: HAVE_LIVEPATCH and STRICT_MODULE_RWX the even
> > less tested s390 bits included below.
> > 
> > Heiko, apologies if I completely wrecked it.
> > 
> > The purpose is to remove module_disable_ro()/module_enable_ro() from
> > livepatch/core.c such that:
> > 
> >  - nothing relies on where in the module loading path module text goes RX.
> >  - nothing ever has writable text
> 
> Given that Steven reported a crash, I assume I can wait until you
> repost a new version of the series, which also includes s390 bits?

His crash is somewhat orthogonal, but yes, I will repost once sorted.


Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-22 Thread Heiko Carstens
On Mon, Oct 21, 2019 at 06:11:35PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> 
> > So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
> > available on Power and x86, and Power does not have STRICT_MODULE_RWX,
> > the below should be sufficient.
> > 
> > Completely untested...
> 
> And because s390 also has: HAVE_LIVEPATCH and STRICT_MODULE_RWX the even
> less tested s390 bits included below.
> 
> Heiko, apologies if I completely wrecked it.
> 
> The purpose is to remove module_disable_ro()/module_enable_ro() from
> livepatch/core.c such that:
> 
>  - nothing relies on where in the module loading path module text goes RX.
>  - nothing ever has writable text

Given that Steven reported a crash, I assume I can wait until you
repost a new version of the series, which also includes s390 bits?



Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-21 Thread Steven Rostedt
On Fri, 18 Oct 2019 09:35:40 +0200
Peter Zijlstra  wrote:

> Now that set_all_modules_text_*() is gone, nothing depends on the
> relation between ->state = COMING and the protection state anymore.
> This enables moving the protection changes later, such that the COMING
> notifier callbacks can more easily modify the text.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Cc: Jessica Yu 
> ---

This triggered the following bug:

 BUG: unable to handle page fault for address: a01501f1
 #PF: supervisor instruction fetch in kernel mode
 #PF: error_code(0x0011) - permissions violation
 PGD 2a16067 P4D 2a16067 PUD 2a17063 PMD c230c067 PTE 8000c4d74063
 Oops: 0011 [#1] PREEMPT SMP KASAN PTI
 CPU: 2 PID: 638 Comm: systemd-udevd Not tainted 5.4.0-rc3-test+ #98
 ACPI: If an ACPI driver is available for this device, you should use it 
instead of the native driver
 ACPI Warning: SystemIO range 0x0530-0x053F conflicts 
with OpRegion 0x0500-0x0563 (\GPIO) 
(20190816/utaddress-213)
 ACPI: If an ACPI driver is available for this device, you should use it 
instead of the native driver
 ACPI Warning: SystemIO range 0x0500-0x052F conflicts 
with OpRegion 0x0500-0x0563 (\GPIO) 
(20190816/utaddress-213)
 ACPI: If an ACPI driver is available for this device, you should use it 
instead of the native driver
 lpc_ich: Resource conflict(s) found affecting gpio_ich
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 
07/14/2016
 RIP: 0010:trace_event_define_fields_i2c_result+0x0/0x86 [i2c_core]
 Code: 27 6a 00 48 c7 c2 60 34 13 a0 45 31 c9 48 89 df 41 b8 02 00 00 00 b9 12 
00 00 00 48 c7 c6 a0 33 13 a0 e8 02 ec 14 e1 5a 5b c3 <53> 48 c7 c6 20 33 13 a0 
b9 08 00 00 00 41
0 6a 00 41
 RSP: 0018:8880cba07950 EFLAGS: 00010246
 RAX: a01501f1 RBX: a013da40 RCX: 812a147c
 RDX: dc00 RSI: 0008 RDI: a013da40
 RBP: a0142be0 R08: ed1017fde1ab R09: ed1017fde1ab
 R10: ed1017fde1aa R11: 8880bfef0d57 R12: 8880cc22a000
 R13: a013da50 R14: a0137aa8 R15: 8880cd372c60
 FS:  7f062a48f940() GS:8880d468() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: a01501f1 CR3: cb632003 CR4: 001606e0
 Call Trace:
  event_create_dir+0x358/0x7b0
  trace_module_notify+0x20b/0x240
  notifier_call_chain+0x6d/0xa0
  blocking_notifier_call_chain+0x5e/0x80
  load_module+0x39a5/0x3d80
  ? module_frob_arch_sections+0x20/0x20
  ? vfs_read+0xcc/0x1b0
  ? kernel_read+0x95/0xb0
  ? kernel_read_file+0x187/0x310
  ? find_held_lock+0xac/0xd0
  ? syscall_trace_enter+0x369/0x590
  ? __do_sys_finit_module+0x11a/0x1b0
  __do_sys_finit_module+0x11a/0x1b0
  ? __ia32_sys_init_module+0x40/0x40
  ? trace_hardirqs_on+0x2e/0x120
  ? ktime_get_coarse_real_ts64+0x6c/0xf0
  ? syscall_trace_enter+0x233/0x590
  ? do_syscall_64+0x14/0x1a0
  do_syscall_64+0x68/0x1a0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Attached config, but it seems to be triggered with modules that have
trace events defined in them.

The trace_event_define_fields_() is defined in
include/trace/trace_events.h and is an init function called by the
trace_events event_create_dir() via the module notifier:
MODULE_STATE_COMING

-- Steve


config.gz
Description: application/gzip


Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-21 Thread Peter Zijlstra
On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:

> So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
> available on Power and x86, and Power does not have STRICT_MODULE_RWX,
> the below should be sufficient.
> 
> Completely untested...

And because s390 also has: HAVE_LIVEPATCH and STRICT_MODULE_RWX the even
less tested s390 bits included below.

Heiko, apologies if I completely wrecked it.

The purpose is to remove module_disable_ro()/module_enable_ro() from
livepatch/core.c such that:

 - nothing relies on where in the module loading path module text goes RX.
 - nothing ever has writable text

> ---
>  arch/x86/kernel/module.c  | 40 +---
>  include/linux/livepatch.h |  7 +++
>  kernel/livepatch/core.c   | 14 ++
>  3 files changed, 50 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> index d5c72cb877b3..76fa2c5f2d7b 100644
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -126,11 +126,12 @@ int apply_relocate(Elf32_Shdr *sechdrs,
>   return 0;
>  }
>  #else /*X86_64*/
> -int apply_relocate_add(Elf64_Shdr *sechdrs,
> +int __apply_relocate_add(Elf64_Shdr *sechdrs,
>  const char *strtab,
>  unsigned int symindex,
>  unsigned int relsec,
> -struct module *me)
> +struct module *me,
> +void *(*write)(void *addr, const void *val, size_t size))
>  {
>   unsigned int i;
>   Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
> @@ -162,19 +163,19 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>   case R_X86_64_64:
>   if (*(u64 *)loc != 0)
>   goto invalid_relocation;
> - *(u64 *)loc = val;
> + write(loc, , 8);
>   break;
>   case R_X86_64_32:
>   if (*(u32 *)loc != 0)
>   goto invalid_relocation;
> - *(u32 *)loc = val;
> + write(loc, , 4);
>   if (val != *(u32 *)loc)
>   goto overflow;
>   break;
>   case R_X86_64_32S:
>   if (*(s32 *)loc != 0)
>   goto invalid_relocation;
> - *(s32 *)loc = val;
> + write(loc, , 4);
>   if ((s64)val != *(s32 *)loc)
>   goto overflow;
>   break;
> @@ -183,7 +184,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>   if (*(u32 *)loc != 0)
>   goto invalid_relocation;
>   val -= (u64)loc;
> - *(u32 *)loc = val;
> + write(loc, , 4);
>  #if 0
>   if ((s64)val != *(s32 *)loc)
>   goto overflow;
> @@ -193,7 +194,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>   if (*(u64 *)loc != 0)
>   goto invalid_relocation;
>   val -= (u64)loc;
> - *(u64 *)loc = val;
> + write(loc, , 8);
>   break;
>   default:
>   pr_err("%s: Unknown rela relocation: %llu\n",
> @@ -215,6 +216,31 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>  me->name);
>   return -ENOEXEC;
>  }
> +
> +int apply_relocate_add(Elf64_Shdr *sechdrs,
> +const char *strtab,
> +unsigned int symindex,
> +unsigned int relsec,
> +struct module *me)
> +{
> + return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, 
> memcpy);
> +}
> +
> +int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
> +const char *strtab,
> +unsigned int symindex,
> +unsigned int relsec,
> +struct module *me)
> +{
> + int ret;
> +
> + ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, 
> text_poke);
> + if (!ret)
> + text_poke_sync();
> +
> + return ret;
> +}
> +
>  #endif
>  
>  int module_finalize(const Elf_Ehdr *hdr,
> diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
> index 273400814020..5b8c10871b70 100644
> --- a/include/linux/livepatch.h
> +++ b/include/linux/livepatch.h
> @@ -217,6 +217,13 @@ void *klp_shadow_get_or_alloc(void *obj, unsigned long 
> id,
>  void klp_shadow_free(void *obj, unsigned long id, klp_shadow_dtor_t dtor);
>  void klp_shadow_free_all(unsigned long id, klp_shadow_dtor_t dtor);
>  
> +
> +extern int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
> +   const char *strtab,
> +   unsigned int 

Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-21 Thread Peter Zijlstra
On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote:
> So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
> available on Power and x86, and Power does not have STRICT_MODULE_RWX,
> the below should be sufficient.

And... s390 also has HAVE_LIVEPATCH and STRICT_MODULE_RWX, so let me
poke at that.


Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-21 Thread Peter Zijlstra
On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote:
> > On Fri, Oct 18, 2019 at 09:35:40AM +0200, Peter Zijlstra wrote:
> > > Now that set_all_modules_text_*() is gone, nothing depends on the
> > > relation between ->state = COMING and the protection state anymore.
> > > This enables moving the protection changes later, such that the COMING
> > > notifier callbacks can more easily modify the text.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) 
> > > Cc: Jessica Yu 
> > > ---
> > >  kernel/module.c |8 
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > --- a/kernel/module.c
> > > +++ b/kernel/module.c
> > > @@ -3683,10 +3683,6 @@ static int complete_formation(struct mod
> > >   /* This relies on module_mutex for list integrity. */
> > >   module_bug_finalize(info->hdr, info->sechdrs, mod);
> > >  
> > > - module_enable_ro(mod, false);
> > > - module_enable_nx(mod);
> > > - module_enable_x(mod);
> > > -
> > >   /* Mark state as coming so strong_try_module_get() ignores us,
> > >* but kallsyms etc. can see us. */
> > >   mod->state = MODULE_STATE_COMING;
> > > @@ -3852,6 +3848,10 @@ static int load_module(struct load_info
> > >   if (err)
> > >   goto bug_cleanup;
> > >  
> > > + module_enable_ro(mod, false);
> > > + module_enable_nx(mod);
> > > + module_enable_x(mod);
> > > +
> > >   /* Module is ready to execute: parsing args may do that. */
> > >   after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
> > > -32768, 32767, mod,
> > 
> > [ Sorry if this was already discussed, I still have a large backlog. ]
> > 
> > Doesn't livepatch code also need to be modified?  We have:
> 
> Urgh bah.. I was too focussed on the other klp borkage :/ But yes,
> arm64/ftrace and klp are the only two users of that function (outside of
> module.c) and Mark was already writing a patch for arm64.
> 
> Means these last two patches need to wait a little until we've fixed
> those.

So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
available on Power and x86, and Power does not have STRICT_MODULE_RWX,
the below should be sufficient.

Completely untested...

---
 arch/x86/kernel/module.c  | 40 +---
 include/linux/livepatch.h |  7 +++
 kernel/livepatch/core.c   | 14 ++
 3 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index d5c72cb877b3..76fa2c5f2d7b 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -126,11 +126,12 @@ int apply_relocate(Elf32_Shdr *sechdrs,
return 0;
 }
 #else /*X86_64*/
-int apply_relocate_add(Elf64_Shdr *sechdrs,
+int __apply_relocate_add(Elf64_Shdr *sechdrs,
   const char *strtab,
   unsigned int symindex,
   unsigned int relsec,
-  struct module *me)
+  struct module *me,
+  void *(*write)(void *addr, const void *val, size_t size))
 {
unsigned int i;
Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
@@ -162,19 +163,19 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
case R_X86_64_64:
if (*(u64 *)loc != 0)
goto invalid_relocation;
-   *(u64 *)loc = val;
+   write(loc, , 8);
break;
case R_X86_64_32:
if (*(u32 *)loc != 0)
goto invalid_relocation;
-   *(u32 *)loc = val;
+   write(loc, , 4);
if (val != *(u32 *)loc)
goto overflow;
break;
case R_X86_64_32S:
if (*(s32 *)loc != 0)
goto invalid_relocation;
-   *(s32 *)loc = val;
+   write(loc, , 4);
if ((s64)val != *(s32 *)loc)
goto overflow;
break;
@@ -183,7 +184,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
if (*(u32 *)loc != 0)
goto invalid_relocation;
val -= (u64)loc;
-   *(u32 *)loc = val;
+   write(loc, , 4);
 #if 0
if ((s64)val != *(s32 *)loc)
goto overflow;
@@ -193,7 +194,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
if (*(u64 *)loc != 0)
goto invalid_relocation;
val -= (u64)loc;
-   *(u64 *)loc = val;
+   write(loc, , 8);
break;
default:
pr_err("%s: 

Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-21 Thread Peter Zijlstra
On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote:
> On Fri, Oct 18, 2019 at 09:35:40AM +0200, Peter Zijlstra wrote:
> > Now that set_all_modules_text_*() is gone, nothing depends on the
> > relation between ->state = COMING and the protection state anymore.
> > This enables moving the protection changes later, such that the COMING
> > notifier callbacks can more easily modify the text.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > Cc: Jessica Yu 
> > ---
> >  kernel/module.c |8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -3683,10 +3683,6 @@ static int complete_formation(struct mod
> > /* This relies on module_mutex for list integrity. */
> > module_bug_finalize(info->hdr, info->sechdrs, mod);
> >  
> > -   module_enable_ro(mod, false);
> > -   module_enable_nx(mod);
> > -   module_enable_x(mod);
> > -
> > /* Mark state as coming so strong_try_module_get() ignores us,
> >  * but kallsyms etc. can see us. */
> > mod->state = MODULE_STATE_COMING;
> > @@ -3852,6 +3848,10 @@ static int load_module(struct load_info
> > if (err)
> > goto bug_cleanup;
> >  
> > +   module_enable_ro(mod, false);
> > +   module_enable_nx(mod);
> > +   module_enable_x(mod);
> > +
> > /* Module is ready to execute: parsing args may do that. */
> > after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
> >   -32768, 32767, mod,
> 
> [ Sorry if this was already discussed, I still have a large backlog. ]
> 
> Doesn't livepatch code also need to be modified?  We have:

Urgh bah.. I was too focussed on the other klp borkage :/ But yes,
arm64/ftrace and klp are the only two users of that function (outside of
module.c) and Mark was already writing a patch for arm64.

Means these last two patches need to wait a little until we've fixed
those.


Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-21 Thread Josh Poimboeuf
On Fri, Oct 18, 2019 at 09:35:40AM +0200, Peter Zijlstra wrote:
> Now that set_all_modules_text_*() is gone, nothing depends on the
> relation between ->state = COMING and the protection state anymore.
> This enables moving the protection changes later, such that the COMING
> notifier callbacks can more easily modify the text.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Cc: Jessica Yu 
> ---
>  kernel/module.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3683,10 +3683,6 @@ static int complete_formation(struct mod
>   /* This relies on module_mutex for list integrity. */
>   module_bug_finalize(info->hdr, info->sechdrs, mod);
>  
> - module_enable_ro(mod, false);
> - module_enable_nx(mod);
> - module_enable_x(mod);
> -
>   /* Mark state as coming so strong_try_module_get() ignores us,
>* but kallsyms etc. can see us. */
>   mod->state = MODULE_STATE_COMING;
> @@ -3852,6 +3848,10 @@ static int load_module(struct load_info
>   if (err)
>   goto bug_cleanup;
>  
> + module_enable_ro(mod, false);
> + module_enable_nx(mod);
> + module_enable_x(mod);
> +
>   /* Module is ready to execute: parsing args may do that. */
>   after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
> -32768, 32767, mod,

[ Sorry if this was already discussed, I still have a large backlog. ]

Doesn't livepatch code also need to be modified?  We have:

prepare_coming_module()
klp_module_coming()
klp_init_object_loaded()
module_disable_ro()
...
module_enable_ro()

which is done right before the above patch does module_enable_ro().

We could remove the disable-RO from that case, though we'd still need it
for another case (late module patching).

-- 
Josh



[PATCH v4 15/16] module: Move where we mark modules RO,X

2019-10-18 Thread Peter Zijlstra
Now that set_all_modules_text_*() is gone, nothing depends on the
relation between ->state = COMING and the protection state anymore.
This enables moving the protection changes later, such that the COMING
notifier callbacks can more easily modify the text.

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Jessica Yu 
---
 kernel/module.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3683,10 +3683,6 @@ static int complete_formation(struct mod
/* This relies on module_mutex for list integrity. */
module_bug_finalize(info->hdr, info->sechdrs, mod);
 
-   module_enable_ro(mod, false);
-   module_enable_nx(mod);
-   module_enable_x(mod);
-
/* Mark state as coming so strong_try_module_get() ignores us,
 * but kallsyms etc. can see us. */
mod->state = MODULE_STATE_COMING;
@@ -3852,6 +3848,10 @@ static int load_module(struct load_info
if (err)
goto bug_cleanup;
 
+   module_enable_ro(mod, false);
+   module_enable_nx(mod);
+   module_enable_x(mod);
+
/* Module is ready to execute: parsing args may do that. */
after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
  -32768, 32767, mod,