Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed
Masami Hiramatsu (Google) writes: > On Thu, 2 May 2024 01:35:16 +0800 > Guo Ren wrote: > >> On Thu, May 2, 2024 at 12:30 AM Stephen Brennan >> wrote: >> > >> > If an error happens in ftrace, ftrace_kill() will prevent disarming >> > kprobes. Eventually, the ftrace_ops associated with the kprobes will be >> > freed, yet the kprobes will still be active, and when triggered, they >> > will use the freed memory, likely resulting in a page fault and panic. >> > >> > This behavior can be reproduced quite easily, by creating a kprobe and >> > then triggering a ftrace_kill(). For simplicity, we can simulate an >> > ftrace error with a kernel module like [1]: >> > >> > [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer >> > >> > sudo perf probe --add commit_creds >> > sudo perf trace -e probe:commit_creds >> > # In another terminal >> > make >> > sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug >> > # Back to perf terminal >> > # ctrl-c >> > sudo perf probe --del commit_creds >> > >> > After a short period, a page fault and panic would occur as the kprobe >> > continues to execute and uses the freed ftrace_ops. While ftrace_kill() >> > is supposed to be used only in extreme circumstances, it is invoked in >> > FTRACE_WARN_ON() and so there are many places where an unexpected bug >> > could be triggered, yet the system may continue operating, possibly >> > without the administrator noticing. If ftrace_kill() does not panic the >> > system, then we should do everything we can to continue operating, >> > rather than leave a ticking time bomb. >> > >> > Signed-off-by: Stephen Brennan >> > --- >> > Changes in v3: >> > Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled" >> > variable and check it directly in the kprobe handlers. >> > Link to v1/v2 discussion: >> > >> > https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.bren...@oracle.com/ >> > >> > arch/csky/kernel/probes/ftrace.c | 3 +++ >> > arch/loongarch/kernel/ftrace_dyn.c | 3 +++ >> > arch/parisc/kernel/ftrace.c | 3 +++ >> > arch/powerpc/kernel/kprobes-ftrace.c | 3 +++ >> > arch/riscv/kernel/probes/ftrace.c| 3 +++ >> > arch/s390/kernel/ftrace.c| 3 +++ >> > arch/x86/kernel/kprobes/ftrace.c | 3 +++ >> > include/linux/kprobes.h | 7 +++ >> > kernel/kprobes.c | 6 ++ >> > kernel/trace/ftrace.c| 1 + >> > 10 files changed, 35 insertions(+) >> > >> > diff --git a/arch/csky/kernel/probes/ftrace.c >> > b/arch/csky/kernel/probes/ftrace.c >> > index 834cffcfbce3..7ba4b98076de 100644 >> > --- a/arch/csky/kernel/probes/ftrace.c >> > +++ b/arch/csky/kernel/probes/ftrace.c >> > @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned >> > long parent_ip, >> > struct kprobe_ctlblk *kcb; >> > struct pt_regs *regs; >> > >> > + if (unlikely(kprobe_ftrace_disabled)) >> > + return; >> > + >> For csky part. >> Acked-by: Guo Ren > > Thanks Stephen, Guo and Steve! > > Let me pick this to probes/for-next! Thank you Masami! I did want to check, is this the correct git tree to be watching? https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git/log/?h=probes/for-next ( I'm not trying to pressure on timing, as I know the merge window is hectic. Just making sure I'm watching the correct place! ) Thanks, Stephen
Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed
Christophe Leroy writes: > Le 01/05/2024 à 18:29, Stephen Brennan a écrit : >> If an error happens in ftrace, ftrace_kill() will prevent disarming >> kprobes. Eventually, the ftrace_ops associated with the kprobes will be >> freed, yet the kprobes will still be active, and when triggered, they >> will use the freed memory, likely resulting in a page fault and panic. >> >> This behavior can be reproduced quite easily, by creating a kprobe and >> then triggering a ftrace_kill(). For simplicity, we can simulate an >> ftrace error with a kernel module like [1]: >> >> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer >> >>sudo perf probe --add commit_creds >>sudo perf trace -e probe:commit_creds >># In another terminal >>make >>sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug >># Back to perf terminal >># ctrl-c >>sudo perf probe --del commit_creds >> >> After a short period, a page fault and panic would occur as the kprobe >> continues to execute and uses the freed ftrace_ops. While ftrace_kill() >> is supposed to be used only in extreme circumstances, it is invoked in >> FTRACE_WARN_ON() and so there are many places where an unexpected bug >> could be triggered, yet the system may continue operating, possibly >> without the administrator noticing. If ftrace_kill() does not panic the >> system, then we should do everything we can to continue operating, >> rather than leave a ticking time bomb. >> >> Signed-off-by: Stephen Brennan >> --- >> Changes in v3: >>Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled" >>variable and check it directly in the kprobe handlers. > > Isn't it safer to provide a fonction rather than a direct access to a > variable ? Is the concern that other code could modify this variable? If so, then I suppose the function call is safer. But the variable is not exported and I think built-in code can be trusted not to muck with it. Maybe I'm missing your point about safety though? > By the way, wouldn't it be more performant to use a static branch (jump > label) ? I agree with Steven's concern that text modification would unfortunately not be a good way to handle an error in text modification. Especially, I believe there could be deadlock risks, as static key enablement requires taking the text_mutex and the jump_label_mutex. I'd be concerned that the text_mutex could already be held in some situations where ftrace_kill() is called. But I'm not certain about that. Thanks for taking a look! Stephen
[PATCH v3] kprobe/ftrace: bail out if ftrace was killed
If an error happens in ftrace, ftrace_kill() will prevent disarming kprobes. Eventually, the ftrace_ops associated with the kprobes will be freed, yet the kprobes will still be active, and when triggered, they will use the freed memory, likely resulting in a page fault and panic. This behavior can be reproduced quite easily, by creating a kprobe and then triggering a ftrace_kill(). For simplicity, we can simulate an ftrace error with a kernel module like [1]: [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer sudo perf probe --add commit_creds sudo perf trace -e probe:commit_creds # In another terminal make sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug # Back to perf terminal # ctrl-c sudo perf probe --del commit_creds After a short period, a page fault and panic would occur as the kprobe continues to execute and uses the freed ftrace_ops. While ftrace_kill() is supposed to be used only in extreme circumstances, it is invoked in FTRACE_WARN_ON() and so there are many places where an unexpected bug could be triggered, yet the system may continue operating, possibly without the administrator noticing. If ftrace_kill() does not panic the system, then we should do everything we can to continue operating, rather than leave a ticking time bomb. Signed-off-by: Stephen Brennan --- Changes in v3: Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled" variable and check it directly in the kprobe handlers. Link to v1/v2 discussion: https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.bren...@oracle.com/ arch/csky/kernel/probes/ftrace.c | 3 +++ arch/loongarch/kernel/ftrace_dyn.c | 3 +++ arch/parisc/kernel/ftrace.c | 3 +++ arch/powerpc/kernel/kprobes-ftrace.c | 3 +++ arch/riscv/kernel/probes/ftrace.c| 3 +++ arch/s390/kernel/ftrace.c| 3 +++ arch/x86/kernel/kprobes/ftrace.c | 3 +++ include/linux/kprobes.h | 7 +++ kernel/kprobes.c | 6 ++ kernel/trace/ftrace.c| 1 + 10 files changed, 35 insertions(+) diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c index 834cffcfbce3..7ba4b98076de 100644 --- a/arch/csky/kernel/probes/ftrace.c +++ b/arch/csky/kernel/probes/ftrace.c @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe_ctlblk *kcb; struct pt_regs *regs; + if (unlikely(kprobe_ftrace_disabled)) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c index 73858c9029cc..bff058317062 100644 --- a/arch/loongarch/kernel/ftrace_dyn.c +++ b/arch/loongarch/kernel/ftrace_dyn.c @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe *p; struct kprobe_ctlblk *kcb; + if (unlikely(kprobe_ftrace_disabled)) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c index 621a4b386ae4..c91f9c2e61ed 100644 --- a/arch/parisc/kernel/ftrace.c +++ b/arch/parisc/kernel/ftrace.c @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe *p; int bit; + if (unlikely(kprobe_ftrace_disabled)) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c index 072ebe7f290b..f8208c027148 100644 --- a/arch/powerpc/kernel/kprobes-ftrace.c +++ b/arch/powerpc/kernel/kprobes-ftrace.c @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip, struct pt_regs *regs; int bit; + if (unlikely(kprobe_ftrace_disabled)) + return; + bit = ftrace_test_recursion_trylock(nip, parent_nip); if (bit < 0) return; diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c index 7142ec42e889..a69dfa610aa8 100644 --- a/arch/riscv/kernel/probes/ftrace.c +++ b/arch/riscv/kernel/probes/ftrace.c @@ -11,6 +11,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe_ctlblk *kcb; int bit; + if (unlikely(kprobe_ftrace_disabled)) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c index c46381ea04ec..7f6f8c438c26 100644 --- a/arch/s390/kernel/ftrace.c +++ b/arch/s390/kernel/ftrace.c @@ -296,6 +296,9 @@ void kprobe_ftrace_handler(unsigned long ip,
Re: [PATCH v2] kprobe/ftrace: bail out if ftrace was killed
Steven Rostedt writes: > On Mon, 29 Apr 2024 10:47:18 -0700 > Stephen Brennan wrote: > >> If an error happens in ftrace, ftrace_kill() will prevent disarming >> kprobes. Eventually, the ftrace_ops associated with the kprobes will be >> freed, yet the kprobes will still be active, and when triggered, they >> will use the freed memory, likely resulting in a page fault and panic. >> >> This behavior can be reproduced quite easily, by creating a kprobe and >> then triggering a ftrace_kill(). For simplicity, we can simulate an >> ftrace error with a kernel module like [1]: >> >> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer >> >> sudo perf probe --add commit_creds >> sudo perf trace -e probe:commit_creds >> # In another terminal >> make >> sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug >> # Back to perf terminal >> # ctrl-c >> sudo perf probe --del commit_creds >> >> After a short period, a page fault and panic would occur as the kprobe >> continues to execute and uses the freed ftrace_ops. While ftrace_kill() >> is supposed to be used only in extreme circumstances, it is invoked in >> FTRACE_WARN_ON() and so there are many places where an unexpected bug >> could be triggered, yet the system may continue operating, possibly >> without the administrator noticing. If ftrace_kill() does not panic the >> system, then we should do everything we can to continue operating, >> rather than leave a ticking time bomb. >> >> Signed-off-by: Stephen Brennan >> --- >> Difference from v1: removed both existing declarations of ftrace_is_dead() >> from kernel/trace/trace.h. >> >> arch/csky/kernel/probes/ftrace.c | 3 +++ >> arch/loongarch/kernel/ftrace_dyn.c | 3 +++ >> arch/parisc/kernel/ftrace.c | 3 +++ >> arch/powerpc/kernel/kprobes-ftrace.c | 3 +++ >> arch/riscv/kernel/probes/ftrace.c| 3 +++ >> arch/s390/kernel/ftrace.c| 3 +++ >> arch/x86/kernel/kprobes/ftrace.c | 3 +++ >> include/linux/ftrace.h | 2 ++ >> kernel/trace/trace.h | 2 -- >> 9 files changed, 23 insertions(+), 2 deletions(-) >> >> diff --git a/arch/csky/kernel/probes/ftrace.c >> b/arch/csky/kernel/probes/ftrace.c >> index 834cffcfbce3..3931bf9f707b 100644 >> --- a/arch/csky/kernel/probes/ftrace.c >> +++ b/arch/csky/kernel/probes/ftrace.c >> @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long >> parent_ip, >> struct kprobe_ctlblk *kcb; >> struct pt_regs *regs; >> >> +if (unlikely(ftrace_is_dead())) >> +return; >> + >> bit = ftrace_test_recursion_trylock(ip, parent_ip); >> if (bit < 0) >> return; >> diff --git a/arch/loongarch/kernel/ftrace_dyn.c >> b/arch/loongarch/kernel/ftrace_dyn.c >> index 73858c9029cc..82c952cb5be0 100644 >> --- a/arch/loongarch/kernel/ftrace_dyn.c >> +++ b/arch/loongarch/kernel/ftrace_dyn.c >> @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned >> long parent_ip, >> struct kprobe *p; >> struct kprobe_ctlblk *kcb; >> >> +if (unlikely(ftrace_is_dead())) >> +return; >> + >> bit = ftrace_test_recursion_trylock(ip, parent_ip); >> if (bit < 0) >> return; >> diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c >> index 621a4b386ae4..3660834f54c3 100644 >> --- a/arch/parisc/kernel/ftrace.c >> +++ b/arch/parisc/kernel/ftrace.c >> @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned >> long parent_ip, >> struct kprobe *p; >> int bit; >> >> +if (unlikely(ftrace_is_dead())) >> +return; >> + >> bit = ftrace_test_recursion_trylock(ip, parent_ip); >> if (bit < 0) >> return; >> diff --git a/arch/powerpc/kernel/kprobes-ftrace.c >> b/arch/powerpc/kernel/kprobes-ftrace.c >> index 072ebe7f290b..85eb55aa1457 100644 >> --- a/arch/powerpc/kernel/kprobes-ftrace.c >> +++ b/arch/powerpc/kernel/kprobes-ftrace.c >> @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned >> long parent_nip, >> struct pt_regs *regs; >> int bit; >> >> +if (unlikely(ftrace_is_dead())) >> +return; >> + >> bit = ftrace_test_recursion_trylock(nip, parent_nip); >> if (bit < 0) >>
Re: [PATCH] kprobe/ftrace: bail out if ftrace was killed
Masami Hiramatsu (Google) writes: > Hi Stephen, > > On Fri, 26 Apr 2024 15:58:34 -0700 > Stephen Brennan wrote: > >> If an error happens in ftrace, ftrace_kill() will prevent disarming >> kprobes. Eventually, the ftrace_ops associated with the kprobes will be >> freed, yet the kprobes will still be active, and when triggered, they >> will use the freed memory, likely resulting in a page fault and panic. > > Hmm, indeed. > >> >> This behavior can be reproduced quite easily, by creating a kprobe and >> then triggering a ftrace_kill(). For simplicity, we can simulate an >> ftrace error with a kernel module like [1]: >> >> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer >> >> sudo perf probe --add commit_creds >> sudo perf trace -e probe:commit_creds >> # In another terminal >> make >> sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug >> # Back to perf terminal >> # ctrl-c >> sudo perf probe --del commit_creds >> >> After a short period, a page fault and panic would occur as the kprobe >> continues to execute and uses the freed ftrace_ops. While ftrace_kill() >> is supposed to be used only in extreme circumstances, it is invoked in >> FTRACE_WARN_ON() and so there are many places where an unexpected bug >> could be triggered, yet the system may continue operating, possibly >> without the administrator noticing. If ftrace_kill() does not panic the >> system, then we should do everything we can to continue operating, >> rather than leave a ticking time bomb. > > OK, the patch looks good to me. > > Acked-by: Masami Hiramatsu (Google) > > Thanks! Hi Masami, Thank you! Sadly I took a second look at the patch and noticed I forgot to remove the existing declarations of ftrace_is_dead() from kernel/trace/trace.h. I've sent v2 in reply to v1 in order to correct that. I'm sorry for the churn. Thanks, Stephen >> >> Signed-off-by: Stephen Brennan >> --- >> >> Apologies for the wide net cast here. I recognize that a change like this >> may need to be split up and go through arch-specific trees. I hoped to get >> feedback on the patch itself. If it's satisfactory and the architecture >> maintainers prefer it split out, I'm glad to do it. Thanks! >> >> arch/csky/kernel/probes/ftrace.c | 3 +++ >> arch/loongarch/kernel/ftrace_dyn.c | 3 +++ >> arch/parisc/kernel/ftrace.c | 3 +++ >> arch/powerpc/kernel/kprobes-ftrace.c | 3 +++ >> arch/riscv/kernel/probes/ftrace.c| 3 +++ >> arch/s390/kernel/ftrace.c| 3 +++ >> arch/x86/kernel/kprobes/ftrace.c | 3 +++ >> include/linux/ftrace.h | 2 ++ >> 8 files changed, 23 insertions(+) >> >> diff --git a/arch/csky/kernel/probes/ftrace.c >> b/arch/csky/kernel/probes/ftrace.c >> index 834cffcfbce3..3931bf9f707b 100644 >> --- a/arch/csky/kernel/probes/ftrace.c >> +++ b/arch/csky/kernel/probes/ftrace.c >> @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long >> parent_ip, >> struct kprobe_ctlblk *kcb; >> struct pt_regs *regs; >> >> +if (unlikely(ftrace_is_dead())) >> +return; >> + >> bit = ftrace_test_recursion_trylock(ip, parent_ip); >> if (bit < 0) >> return; >> diff --git a/arch/loongarch/kernel/ftrace_dyn.c >> b/arch/loongarch/kernel/ftrace_dyn.c >> index 73858c9029cc..82c952cb5be0 100644 >> --- a/arch/loongarch/kernel/ftrace_dyn.c >> +++ b/arch/loongarch/kernel/ftrace_dyn.c >> @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned >> long parent_ip, >> struct kprobe *p; >> struct kprobe_ctlblk *kcb; >> >> +if (unlikely(ftrace_is_dead())) >> +return; >> + >> bit = ftrace_test_recursion_trylock(ip, parent_ip); >> if (bit < 0) >> return; >> diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c >> index 621a4b386ae4..3660834f54c3 100644 >> --- a/arch/parisc/kernel/ftrace.c >> +++ b/arch/parisc/kernel/ftrace.c >> @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned >> long parent_ip, >> struct kprobe *p; >> int bit; >> >> +if (unlikely(ftrace_is_dead())) >> +return; >> + >> bit = ftrace_test_recursion_trylock(ip, parent_ip); >> if (bit < 0) >> return; >> diff --git a/arch/powerpc/kernel/kprobes-ftrace.c >> b/ar
[PATCH v2] kprobe/ftrace: bail out if ftrace was killed
If an error happens in ftrace, ftrace_kill() will prevent disarming kprobes. Eventually, the ftrace_ops associated with the kprobes will be freed, yet the kprobes will still be active, and when triggered, they will use the freed memory, likely resulting in a page fault and panic. This behavior can be reproduced quite easily, by creating a kprobe and then triggering a ftrace_kill(). For simplicity, we can simulate an ftrace error with a kernel module like [1]: [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer sudo perf probe --add commit_creds sudo perf trace -e probe:commit_creds # In another terminal make sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug # Back to perf terminal # ctrl-c sudo perf probe --del commit_creds After a short period, a page fault and panic would occur as the kprobe continues to execute and uses the freed ftrace_ops. While ftrace_kill() is supposed to be used only in extreme circumstances, it is invoked in FTRACE_WARN_ON() and so there are many places where an unexpected bug could be triggered, yet the system may continue operating, possibly without the administrator noticing. If ftrace_kill() does not panic the system, then we should do everything we can to continue operating, rather than leave a ticking time bomb. Signed-off-by: Stephen Brennan --- Difference from v1: removed both existing declarations of ftrace_is_dead() from kernel/trace/trace.h. arch/csky/kernel/probes/ftrace.c | 3 +++ arch/loongarch/kernel/ftrace_dyn.c | 3 +++ arch/parisc/kernel/ftrace.c | 3 +++ arch/powerpc/kernel/kprobes-ftrace.c | 3 +++ arch/riscv/kernel/probes/ftrace.c| 3 +++ arch/s390/kernel/ftrace.c| 3 +++ arch/x86/kernel/kprobes/ftrace.c | 3 +++ include/linux/ftrace.h | 2 ++ kernel/trace/trace.h | 2 -- 9 files changed, 23 insertions(+), 2 deletions(-) diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c index 834cffcfbce3..3931bf9f707b 100644 --- a/arch/csky/kernel/probes/ftrace.c +++ b/arch/csky/kernel/probes/ftrace.c @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe_ctlblk *kcb; struct pt_regs *regs; + if (unlikely(ftrace_is_dead())) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c index 73858c9029cc..82c952cb5be0 100644 --- a/arch/loongarch/kernel/ftrace_dyn.c +++ b/arch/loongarch/kernel/ftrace_dyn.c @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe *p; struct kprobe_ctlblk *kcb; + if (unlikely(ftrace_is_dead())) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c index 621a4b386ae4..3660834f54c3 100644 --- a/arch/parisc/kernel/ftrace.c +++ b/arch/parisc/kernel/ftrace.c @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe *p; int bit; + if (unlikely(ftrace_is_dead())) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c index 072ebe7f290b..85eb55aa1457 100644 --- a/arch/powerpc/kernel/kprobes-ftrace.c +++ b/arch/powerpc/kernel/kprobes-ftrace.c @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip, struct pt_regs *regs; int bit; + if (unlikely(ftrace_is_dead())) + return; + bit = ftrace_test_recursion_trylock(nip, parent_nip); if (bit < 0) return; diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c index 7142ec42e889..8814fbe4c888 100644 --- a/arch/riscv/kernel/probes/ftrace.c +++ b/arch/riscv/kernel/probes/ftrace.c @@ -11,6 +11,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe_ctlblk *kcb; int bit; + if (unlikely(ftrace_is_dead())) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c index c46381ea04ec..ccbe8ccf945b 100644 --- a/arch/s390/kernel/ftrace.c +++ b/arch/s390/kernel/ftrace.c @@ -296,6 +296,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe *p; int bit; + if (unlikely(ftrace_is_dead())) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0)
[PATCH] kprobe/ftrace: bail out if ftrace was killed
If an error happens in ftrace, ftrace_kill() will prevent disarming kprobes. Eventually, the ftrace_ops associated with the kprobes will be freed, yet the kprobes will still be active, and when triggered, they will use the freed memory, likely resulting in a page fault and panic. This behavior can be reproduced quite easily, by creating a kprobe and then triggering a ftrace_kill(). For simplicity, we can simulate an ftrace error with a kernel module like [1]: [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer sudo perf probe --add commit_creds sudo perf trace -e probe:commit_creds # In another terminal make sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug # Back to perf terminal # ctrl-c sudo perf probe --del commit_creds After a short period, a page fault and panic would occur as the kprobe continues to execute and uses the freed ftrace_ops. While ftrace_kill() is supposed to be used only in extreme circumstances, it is invoked in FTRACE_WARN_ON() and so there are many places where an unexpected bug could be triggered, yet the system may continue operating, possibly without the administrator noticing. If ftrace_kill() does not panic the system, then we should do everything we can to continue operating, rather than leave a ticking time bomb. Signed-off-by: Stephen Brennan --- Apologies for the wide net cast here. I recognize that a change like this may need to be split up and go through arch-specific trees. I hoped to get feedback on the patch itself. If it's satisfactory and the architecture maintainers prefer it split out, I'm glad to do it. Thanks! arch/csky/kernel/probes/ftrace.c | 3 +++ arch/loongarch/kernel/ftrace_dyn.c | 3 +++ arch/parisc/kernel/ftrace.c | 3 +++ arch/powerpc/kernel/kprobes-ftrace.c | 3 +++ arch/riscv/kernel/probes/ftrace.c| 3 +++ arch/s390/kernel/ftrace.c| 3 +++ arch/x86/kernel/kprobes/ftrace.c | 3 +++ include/linux/ftrace.h | 2 ++ 8 files changed, 23 insertions(+) diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c index 834cffcfbce3..3931bf9f707b 100644 --- a/arch/csky/kernel/probes/ftrace.c +++ b/arch/csky/kernel/probes/ftrace.c @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe_ctlblk *kcb; struct pt_regs *regs; + if (unlikely(ftrace_is_dead())) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c index 73858c9029cc..82c952cb5be0 100644 --- a/arch/loongarch/kernel/ftrace_dyn.c +++ b/arch/loongarch/kernel/ftrace_dyn.c @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe *p; struct kprobe_ctlblk *kcb; + if (unlikely(ftrace_is_dead())) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c index 621a4b386ae4..3660834f54c3 100644 --- a/arch/parisc/kernel/ftrace.c +++ b/arch/parisc/kernel/ftrace.c @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe *p; int bit; + if (unlikely(ftrace_is_dead())) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c index 072ebe7f290b..85eb55aa1457 100644 --- a/arch/powerpc/kernel/kprobes-ftrace.c +++ b/arch/powerpc/kernel/kprobes-ftrace.c @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip, struct pt_regs *regs; int bit; + if (unlikely(ftrace_is_dead())) + return; + bit = ftrace_test_recursion_trylock(nip, parent_nip); if (bit < 0) return; diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c index 7142ec42e889..8814fbe4c888 100644 --- a/arch/riscv/kernel/probes/ftrace.c +++ b/arch/riscv/kernel/probes/ftrace.c @@ -11,6 +11,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe_ctlblk *kcb; int bit; + if (unlikely(ftrace_is_dead())) + return; + bit = ftrace_test_recursion_trylock(ip, parent_ip); if (bit < 0) return; diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c index c46381ea04ec..ccbe8ccf945b 100644 --- a/arch/s390/kernel/ftrace.c +++ b/arch/s390/kernel/ftrace.c @@ -296,6 +296,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct kprobe *p; int bit; + if (unlikely
Re: [PATCH v3 5/9] mm: remove CONFIG_DISCONTIGMEM
Mike Rapoport writes: > From: Mike Rapoport > > There are no architectures that support DISCONTIGMEM left. > > Remove the configuration option and the dead code it was guarding in the > generic memory management code. > > Signed-off-by: Mike Rapoport > --- > include/asm-generic/memory_model.h | 37 -- > include/linux/mmzone.h | 8 --- > mm/Kconfig | 25 +++- > mm/page_alloc.c| 13 --- > 4 files changed, 12 insertions(+), 71 deletions(-) > > diff --git a/include/asm-generic/memory_model.h > b/include/asm-generic/memory_model.h > index 7637fb46ba4f..a2c8ed60233a 100644 > --- a/include/asm-generic/memory_model.h > +++ b/include/asm-generic/memory_model.h > @@ -6,47 +6,18 @@ > > #ifndef __ASSEMBLY__ > > +/* > + * supports 3 memory models. > + */ This comment could either be updated to reflect 2 memory models, or removed entirely. Thanks, Stephen > #if defined(CONFIG_FLATMEM) > > #ifndef ARCH_PFN_OFFSET > #define ARCH_PFN_OFFSET (0UL) > #endif > > -#elif defined(CONFIG_DISCONTIGMEM) > - > -#ifndef arch_pfn_to_nid > -#define arch_pfn_to_nid(pfn) pfn_to_nid(pfn) > -#endif > - > -#ifndef arch_local_page_offset > -#define arch_local_page_offset(pfn, nid) \ > - ((pfn) - NODE_DATA(nid)->node_start_pfn) > -#endif > - > -#endif /* CONFIG_DISCONTIGMEM */ > - > -/* > - * supports 3 memory models. > - */ > -#if defined(CONFIG_FLATMEM) > - > #define __pfn_to_page(pfn) (mem_map + ((pfn) - ARCH_PFN_OFFSET)) > #define __page_to_pfn(page) ((unsigned long)((page) - mem_map) + \ >ARCH_PFN_OFFSET) > -#elif defined(CONFIG_DISCONTIGMEM) > - > -#define __pfn_to_page(pfn) \ > -({ unsigned long __pfn = (pfn);\ > - unsigned long __nid = arch_pfn_to_nid(__pfn); \ > - NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\ > -}) > - > -#define __page_to_pfn(pg)\ > -({ const struct page *__pg = (pg); \ > - struct pglist_data *__pgdat = NODE_DATA(page_to_nid(__pg)); \ > - (unsigned long)(__pg - __pgdat->node_mem_map) + \ > - __pgdat->node_start_pfn; \ > -}) > > #elif defined(CONFIG_SPARSEMEM_VMEMMAP) > > @@ -70,7 +41,7 @@ > struct mem_section *__sec = __pfn_to_section(__pfn);\ > __section_mem_map_addr(__sec) + __pfn; \ > }) > -#endif /* CONFIG_FLATMEM/DISCONTIGMEM/SPARSEMEM */ > +#endif /* CONFIG_FLATMEM/SPARSEMEM */ > > /* > * Convert a physical address to a Page Frame Number and back > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 0d53eba1c383..700032e99419 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -738,10 +738,12 @@ struct zonelist { > struct zoneref _zonerefs[MAX_ZONES_PER_ZONELIST + 1]; > }; > > -#ifndef CONFIG_DISCONTIGMEM > -/* The array of struct pages - for discontigmem use pgdat->lmem_map */ > +/* > + * The array of struct pages for flatmem. > + * It must be declared for SPARSEMEM as well because there are configurations > + * that rely on that. > + */ > extern struct page *mem_map; > -#endif > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > struct deferred_split { > diff --git a/mm/Kconfig b/mm/Kconfig > index 02d44e3420f5..218b96ccc84a 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -19,7 +19,7 @@ choice > > config FLATMEM_MANUAL > bool "Flat Memory" > - depends on !(ARCH_DISCONTIGMEM_ENABLE || ARCH_SPARSEMEM_ENABLE) || > ARCH_FLATMEM_ENABLE > + depends on !ARCH_SPARSEMEM_ENABLE || ARCH_FLATMEM_ENABLE > help > This option is best suited for non-NUMA systems with > flat address space. The FLATMEM is the most efficient > @@ -32,21 +32,6 @@ config FLATMEM_MANUAL > > If unsure, choose this option (Flat Memory) over any other. > > -config DISCONTIGMEM_MANUAL > - bool "Discontiguous Memory" > - depends on ARCH_DISCONTIGMEM_ENABLE > - help > - This option provides enhanced support for discontiguous > - memory systems, over FLATMEM. These systems have holes > - in their physical address spaces, and this option provides > - more efficient handling of these holes. > - > - Although "Discontiguous Memory" is still used by several > - architectures, it is considered deprecated in favor of > - "Sparse Memory". > - > - If unsure, choose "Sparse Memory" over this option. > - > config SPARSEMEM_MANUAL > bool "Sparse Memory" > depends on ARCH_SPARSEMEM_ENABLE > @@ -62,17 +47,13 @@ config SPARSEMEM_MANUAL > > endchoice > > -config DISCONTIGMEM > - def_bool y > - depends on (!SELECT_MEMORY_MODEL && ARCH_DISCONTIGMEM_ENABLE) || > DISCONTIGMEM_MANUAL > - > config SPARSEMEM > def_bool y > depends on