Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2024-05-27 Thread Thorsten Scherer
Hello,

On Mon, May 27, 2024 at 01:51:49PM +0100, Russell King (Oracle) wrote:
> On Mon, May 27, 2024 at 02:28:59PM +0200, Uwe Kleine-König wrote:
> > On Mon, May 27, 2024 at 08:56:16AM +0100, Russell King (Oracle) wrote:
> > > On Mon, May 27, 2024 at 09:43:41AM +0200, Thorsten Scherer wrote:
> > > > Hello,
> > > > 
> > > > in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe 
> > > > pointed me to
> > > > this thread.  With the proposed code change applied the procedure
> > > > 
> > > > # set to some known good (randomly guessed) filter function and 
> > > > enable function_graph
> > > > echo mtdblock_open > set_ftrace_filter
> > > > echo function_graph > current_tracer
> > > > 
> > > > # walk available filter funcs
> > > > cat available_filter_functions | while read f; do echo $f | tee -a 
> > > > set_ftrace_filter; sleep 1; done
> > > > 
> > > > produces the following output
> > > > 
> > > > [  159.832173] Insufficient stack space to handle exception!
> > > > [  159.832241] Task stack: [0xc8e44000..0xc8e46000]
> > > > [  159.842701] IRQ stack:  [0xc880..0xc8802000]
> > > > [  159.847712] Overflow stack: [0xc1934000..0xc1935000]
> > > > [  159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
> > > > [  159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 
> > > > industrialio_triggered_buffer kfifo_buf capture_events_irq 
> > > > capture_events iio_trig_hrtimer industrialio_sw_trigger 
> > > > industrialio_configfs dm_mod
> > > > [  159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
> > > > [  159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
> > > > [  159.888547] PC is at prepare_ftrace_return+0x4/0x7c
> > > > [  159.893520] LR is at ftrace_graph_caller+0x1c/0x28
> > > > [  159.898376] pc : []lr : []psr: 6093
> > > > [  159.904690] sp : c8e44018  ip : c8e44018  fp : c8e4403c
> > > > [  159.909959] r10: c0c09e78  r9 : c35e9bc0  r8 : c010d9bc
> > > > [  159.915227] r7 : 0001  r6 : 0004  r5 : c8e44064  r4 : 
> > > > c8e440ac
> > > > [  159.921800] r3 : c8e44030  r2 : c8e4403c  r1 : c010eb9c  r0 : 
> > > > c8e44038
> > > > [  159.928376] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  
> > > > Segment none
> > > > [  159.935652] Control: 0005317f  Table: 83074000  DAC: 0051
> > > > [  159.941436] Register r0 information: 2-page vmalloc region starting 
> > > > at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  159.952253] Register r1 information: non-slab/vmalloc memory
> > > > [  159.957988] Register r2 information: 2-page vmalloc region starting 
> > > > at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  159.968791] Register r3 information: 2-page vmalloc region starting 
> > > > at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  159.979592] Register r4 information: 2-page vmalloc region starting 
> > > > at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  159.990391] Register r5 information: 2-page vmalloc region starting 
> > > > at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  160.001187] Register r6 information: non-paged memory
> > > > [  160.006303] Register r7 information: non-paged memory
> > > > [  160.011415] Register r8 information: non-slab/vmalloc memory
> > > > [  160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 
> > > > pointer offset 0 size 32
> > > > [  160.025718] Register r10 information: non-slab/vmalloc memory
> > > > [  160.031530] Register r11 information: 2-page vmalloc region starting 
> > > > at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  160.042422] Register r12 information: 2-page vmalloc region starting 
> > > > at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > > [  160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
> > > > [  160.058955] Stack: (0xc8e44018 to 0xc8e46000)
> > > 
> > > No backtrace? No Code: line? I'm guessing there was an attempt to ftrace
> > > a function involving the ftrace tracing infrastructure, which is why 8KB
> > > of stack has been gobbled up. It could be
> > > copy_from_kernel_nofault_allowed() but it would be useful to have at
> > > least some extract of the backtrace showing the recursive cycle to
> > > confirm, otherwise there is nothing in your report to confirm. As I'm
> > > not a ftrace user myself, this isn't something I'd test for, so having
> > > a full report would be useful.
> > 
> > Is not having a backtrace related to ftrace_return_address() returning
> > NULL, as Arnd pointed out in
> > https://lore.kernel.org/linux-arm-kernel/36cd10de-c51c-40ff-90e8-714954060...@app.fastmail.com/
> > ?
> 
> Unlikely - the stack and code lines are also missing. I think the
> submitter truncated the oops which is highly likely given that it
> would've dumped all 8kB of the stack in hex, and the trace and
> code lines would be after that.

sorry for causing additional friction by my imprecise description.
Indeed, this was the whole oops b

Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2024-05-27 Thread Russell King (Oracle)
On Mon, May 27, 2024 at 02:28:59PM +0200, Uwe Kleine-König wrote:
> On Mon, May 27, 2024 at 08:56:16AM +0100, Russell King (Oracle) wrote:
> > On Mon, May 27, 2024 at 09:43:41AM +0200, Thorsten Scherer wrote:
> > > Hello,
> > > 
> > > in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed 
> > > me to
> > > this thread.  With the proposed code change applied the procedure
> > > 
> > > # set to some known good (randomly guessed) filter function and 
> > > enable function_graph
> > > echo mtdblock_open > set_ftrace_filter
> > > echo function_graph > current_tracer
> > > 
> > > # walk available filter funcs
> > > cat available_filter_functions | while read f; do echo $f | tee -a 
> > > set_ftrace_filter; sleep 1; done
> > > 
> > > produces the following output
> > > 
> > > [  159.832173] Insufficient stack space to handle exception!
> > > [  159.832241] Task stack: [0xc8e44000..0xc8e46000]
> > > [  159.842701] IRQ stack:  [0xc880..0xc8802000]
> > > [  159.847712] Overflow stack: [0xc1934000..0xc1935000]
> > > [  159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
> > > [  159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 
> > > industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events 
> > > iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
> > > [  159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
> > > [  159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
> > > [  159.888547] PC is at prepare_ftrace_return+0x4/0x7c
> > > [  159.893520] LR is at ftrace_graph_caller+0x1c/0x28
> > > [  159.898376] pc : []lr : []psr: 6093
> > > [  159.904690] sp : c8e44018  ip : c8e44018  fp : c8e4403c
> > > [  159.909959] r10: c0c09e78  r9 : c35e9bc0  r8 : c010d9bc
> > > [  159.915227] r7 : 0001  r6 : 0004  r5 : c8e44064  r4 : c8e440ac
> > > [  159.921800] r3 : c8e44030  r2 : c8e4403c  r1 : c010eb9c  r0 : c8e44038
> > > [  159.928376] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  
> > > Segment none
> > > [  159.935652] Control: 0005317f  Table: 83074000  DAC: 0051
> > > [  159.941436] Register r0 information: 2-page vmalloc region starting at 
> > > 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  159.952253] Register r1 information: non-slab/vmalloc memory
> > > [  159.957988] Register r2 information: 2-page vmalloc region starting at 
> > > 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  159.968791] Register r3 information: 2-page vmalloc region starting at 
> > > 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  159.979592] Register r4 information: 2-page vmalloc region starting at 
> > > 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  159.990391] Register r5 information: 2-page vmalloc region starting at 
> > > 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  160.001187] Register r6 information: non-paged memory
> > > [  160.006303] Register r7 information: non-paged memory
> > > [  160.011415] Register r8 information: non-slab/vmalloc memory
> > > [  160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 
> > > pointer offset 0 size 32
> > > [  160.025718] Register r10 information: non-slab/vmalloc memory
> > > [  160.031530] Register r11 information: 2-page vmalloc region starting 
> > > at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  160.042422] Register r12 information: 2-page vmalloc region starting 
> > > at 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > > [  160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
> > > [  160.058955] Stack: (0xc8e44018 to 0xc8e46000)
> > 
> > No backtrace? No Code: line? I'm guessing there was an attempt to ftrace
> > a function involving the ftrace tracing infrastructure, which is why 8KB
> > of stack has been gobbled up. It could be
> > copy_from_kernel_nofault_allowed() but it would be useful to have at
> > least some extract of the backtrace showing the recursive cycle to
> > confirm, otherwise there is nothing in your report to confirm. As I'm
> > not a ftrace user myself, this isn't something I'd test for, so having
> > a full report would be useful.
> 
> Is not having a backtrace related to ftrace_return_address() returning
> NULL, as Arnd pointed out in
> https://lore.kernel.org/linux-arm-kernel/36cd10de-c51c-40ff-90e8-714954060...@app.fastmail.com/
> ?

Unlikely - the stack and code lines are also missing. I think the
submitter truncated the oops which is highly likely given that it
would've dumped all 8kB of the stack in hex, and the trace and
code lines would be after that.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!



Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2024-05-27 Thread Uwe Kleine-König
On Mon, May 27, 2024 at 08:56:16AM +0100, Russell King (Oracle) wrote:
> On Mon, May 27, 2024 at 09:43:41AM +0200, Thorsten Scherer wrote:
> > Hello,
> > 
> > in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed me 
> > to
> > this thread.  With the proposed code change applied the procedure
> > 
> > # set to some known good (randomly guessed) filter function and enable 
> > function_graph
> > echo mtdblock_open > set_ftrace_filter
> > echo function_graph > current_tracer
> > 
> > # walk available filter funcs
> > cat available_filter_functions | while read f; do echo $f | tee -a 
> > set_ftrace_filter; sleep 1; done
> > 
> > produces the following output
> > 
> > [  159.832173] Insufficient stack space to handle exception!
> > [  159.832241] Task stack: [0xc8e44000..0xc8e46000]
> > [  159.842701] IRQ stack:  [0xc880..0xc8802000]
> > [  159.847712] Overflow stack: [0xc1934000..0xc1935000]
> > [  159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
> > [  159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 
> > industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events 
> > iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
> > [  159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
> > [  159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
> > [  159.888547] PC is at prepare_ftrace_return+0x4/0x7c
> > [  159.893520] LR is at ftrace_graph_caller+0x1c/0x28
> > [  159.898376] pc : []lr : []psr: 6093
> > [  159.904690] sp : c8e44018  ip : c8e44018  fp : c8e4403c
> > [  159.909959] r10: c0c09e78  r9 : c35e9bc0  r8 : c010d9bc
> > [  159.915227] r7 : 0001  r6 : 0004  r5 : c8e44064  r4 : c8e440ac
> > [  159.921800] r3 : c8e44030  r2 : c8e4403c  r1 : c010eb9c  r0 : c8e44038
> > [  159.928376] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  
> > Segment none
> > [  159.935652] Control: 0005317f  Table: 83074000  DAC: 0051
> > [  159.941436] Register r0 information: 2-page vmalloc region starting at 
> > 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  159.952253] Register r1 information: non-slab/vmalloc memory
> > [  159.957988] Register r2 information: 2-page vmalloc region starting at 
> > 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  159.968791] Register r3 information: 2-page vmalloc region starting at 
> > 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  159.979592] Register r4 information: 2-page vmalloc region starting at 
> > 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  159.990391] Register r5 information: 2-page vmalloc region starting at 
> > 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  160.001187] Register r6 information: non-paged memory
> > [  160.006303] Register r7 information: non-paged memory
> > [  160.011415] Register r8 information: non-slab/vmalloc memory
> > [  160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 
> > pointer offset 0 size 32
> > [  160.025718] Register r10 information: non-slab/vmalloc memory
> > [  160.031530] Register r11 information: 2-page vmalloc region starting at 
> > 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  160.042422] Register r12 information: 2-page vmalloc region starting at 
> > 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> > [  160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
> > [  160.058955] Stack: (0xc8e44018 to 0xc8e46000)
> 
> No backtrace? No Code: line? I'm guessing there was an attempt to ftrace
> a function involving the ftrace tracing infrastructure, which is why 8KB
> of stack has been gobbled up. It could be
> copy_from_kernel_nofault_allowed() but it would be useful to have at
> least some extract of the backtrace showing the recursive cycle to
> confirm, otherwise there is nothing in your report to confirm. As I'm
> not a ftrace user myself, this isn't something I'd test for, so having
> a full report would be useful.

Is not having a backtrace related to ftrace_return_address() returning
NULL, as Arnd pointed out in
https://lore.kernel.org/linux-arm-kernel/36cd10de-c51c-40ff-90e8-714954060...@app.fastmail.com/
?

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |


signature.asc
Description: PGP signature


Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2024-05-27 Thread Russell King (Oracle)
On Mon, May 27, 2024 at 09:43:41AM +0200, Thorsten Scherer wrote:
> Hello,
> 
> in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed me to
> this thread.  With the proposed code change applied the procedure
> 
> # set to some known good (randomly guessed) filter function and enable 
> function_graph
> echo mtdblock_open > set_ftrace_filter
> echo function_graph > current_tracer
> 
> # walk available filter funcs
> cat available_filter_functions | while read f; do echo $f | tee -a 
> set_ftrace_filter; sleep 1; done
> 
> produces the following output
> 
> [  159.832173] Insufficient stack space to handle exception!
> [  159.832241] Task stack: [0xc8e44000..0xc8e46000]
> [  159.842701] IRQ stack:  [0xc880..0xc8802000]
> [  159.847712] Overflow stack: [0xc1934000..0xc1935000]
> [  159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
> [  159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 
> industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events 
> iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
> [  159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
> [  159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
> [  159.888547] PC is at prepare_ftrace_return+0x4/0x7c
> [  159.893520] LR is at ftrace_graph_caller+0x1c/0x28
> [  159.898376] pc : []lr : []psr: 6093
> [  159.904690] sp : c8e44018  ip : c8e44018  fp : c8e4403c
> [  159.909959] r10: c0c09e78  r9 : c35e9bc0  r8 : c010d9bc
> [  159.915227] r7 : 0001  r6 : 0004  r5 : c8e44064  r4 : c8e440ac
> [  159.921800] r3 : c8e44030  r2 : c8e4403c  r1 : c010eb9c  r0 : c8e44038
> [  159.928376] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
> none
> [  159.935652] Control: 0005317f  Table: 83074000  DAC: 0051
> [  159.941436] Register r0 information: 2-page vmalloc region starting at 
> 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  159.952253] Register r1 information: non-slab/vmalloc memory
> [  159.957988] Register r2 information: 2-page vmalloc region starting at 
> 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  159.968791] Register r3 information: 2-page vmalloc region starting at 
> 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  159.979592] Register r4 information: 2-page vmalloc region starting at 
> 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  159.990391] Register r5 information: 2-page vmalloc region starting at 
> 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  160.001187] Register r6 information: non-paged memory
> [  160.006303] Register r7 information: non-paged memory
> [  160.011415] Register r8 information: non-slab/vmalloc memory
> [  160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 
> pointer offset 0 size 32
> [  160.025718] Register r10 information: non-slab/vmalloc memory
> [  160.031530] Register r11 information: 2-page vmalloc region starting at 
> 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  160.042422] Register r12 information: 2-page vmalloc region starting at 
> 0xc8e44000 allocated at kernel_clone+0xa8/0x408
> [  160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
> [  160.058955] Stack: (0xc8e44018 to 0xc8e46000)

No backtrace? No Code: line? I'm guessing there was an attempt to ftrace
a function involving the ftrace tracing infrastructure, which is why 8KB
of stack has been gobbled up. It could be
copy_from_kernel_nofault_allowed() but it would be useful to have at
least some extract of the backtrace showing the recursive cycle to
confirm, otherwise there is nothing in your report to confirm. As I'm
not a ftrace user myself, this isn't something I'd test for, so having
a full report would be useful.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!



Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2024-05-27 Thread Thorsten Scherer
Hello,

in the context of a panic on an i.MX25 based v6.9 kernel [1] Uwe pointed me to
this thread.  With the proposed code change applied the procedure

# set to some known good (randomly guessed) filter function and enable 
function_graph
echo mtdblock_open > set_ftrace_filter
echo function_graph > current_tracer

# walk available filter funcs
cat available_filter_functions | while read f; do echo $f | tee -a 
set_ftrace_filter; sleep 1; done

produces the following output

[  159.832173] Insufficient stack space to handle exception!
[  159.832241] Task stack: [0xc8e44000..0xc8e46000]
[  159.842701] IRQ stack:  [0xc880..0xc8802000]
[  159.847712] Overflow stack: [0xc1934000..0xc1935000]
[  159.852726] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
[  159.858273] Modules linked in: capture_events_imxgpt ti_ads7950 
industrialio_triggered_buffer kfifo_buf capture_events_irq capture_events 
iio_trig_hrtimer industrialio_sw_trigger industrialio_configfs dm_mod
[  159.876948] CPU: 0 PID: 199 Comm: sh Not tainted 6.9.0 #3
[  159.882412] Hardware name: Freescale i.MX25 (Device Tree Support)
[  159.888547] PC is at prepare_ftrace_return+0x4/0x7c
[  159.893520] LR is at ftrace_graph_caller+0x1c/0x28
[  159.898376] pc : []lr : []psr: 6093
[  159.904690] sp : c8e44018  ip : c8e44018  fp : c8e4403c
[  159.909959] r10: c0c09e78  r9 : c35e9bc0  r8 : c010d9bc
[  159.915227] r7 : 0001  r6 : 0004  r5 : c8e44064  r4 : c8e440ac
[  159.921800] r3 : c8e44030  r2 : c8e4403c  r1 : c010eb9c  r0 : c8e44038
[  159.928376] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
none
[  159.935652] Control: 0005317f  Table: 83074000  DAC: 0051
[  159.941436] Register r0 information: 2-page vmalloc region starting at 
0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  159.952253] Register r1 information: non-slab/vmalloc memory
[  159.957988] Register r2 information: 2-page vmalloc region starting at 
0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  159.968791] Register r3 information: 2-page vmalloc region starting at 
0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  159.979592] Register r4 information: 2-page vmalloc region starting at 
0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  159.990391] Register r5 information: 2-page vmalloc region starting at 
0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  160.001187] Register r6 information: non-paged memory
[  160.006303] Register r7 information: non-paged memory
[  160.011415] Register r8 information: non-slab/vmalloc memory
[  160.017139] Register r9 information: slab kmalloc-32 start c35e9bc0 pointer 
offset 0 size 32
[  160.025718] Register r10 information: non-slab/vmalloc memory
[  160.031530] Register r11 information: 2-page vmalloc region starting at 
0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  160.042422] Register r12 information: 2-page vmalloc region starting at 
0xc8e44000 allocated at kernel_clone+0xa8/0x408
[  160.053315] Process sh (pid: 199, stack limit = 0x68fc3abb)
[  160.058955] Stack: (0xc8e44018 to 0xc8e46000)

and points me to copy_from_kernel_nofault_allowed.

Disassembly of section .text:

c010eb8c :
c010eb8c:   e1a0c00dmov ip, sp
c010eb90:   e92dd800push{fp, ip, lr, pc}
c010eb94:   e24cb004sub fp, ip, #4
c010eb98:   e52de004push{lr}; (str lr, [sp, 
#-4]!)
c010eb9c:   ebfffb4fbl  c010d8e0 <__gnu_mcount_nc>
c010eba0:   e35004bfcmp r0, #-1090519040; 0xbf00
c010eba4:   3a03bcc c010ebb8 

c010eba8:   e091addsr0, r0, r1
c010ebac:   33a1movcc   r0, #1
c010ebb0:   23a0movcs   r0, #0
c010ebb4:   e89da800ldm sp, {fp, sp, pc}
c010ebb8:   e3a0mov r0, #0
c010ebbc:   e89da800ldm sp, {fp, sp, pc}


In the time given I couldn't make sense of the issue, so I am passing this
hopefully useful information to you.

Best regards
Thorsten

#regzbot introduced: 953f534a7ed6b725d4f101d2949393acc9262880 ^

[1] 
https://lore.kernel.org/all/alp44tukzo6mvcwl4ke4ehhmojrqnv6xfcdeuliybxfjfvgd3e@gpjvwj33cc76/



Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-02 Thread Justin Chen



On 12/2/2023 1:26 AM, Ard Biesheuvel wrote:

On Sat, 2 Dec 2023 at 09:49, Justin Chen  wrote:




On 12/1/23 10:53 PM, Ard Biesheuvel wrote:

On Fri, 1 Dec 2023 at 23:59, Justin Chen  wrote:




On 12/1/23 10:07 AM, Steven Rostedt wrote:

On Fri, 1 Dec 2023 09:25:59 -0800
Justin Chen  wrote:


It appears the sub instruction at 0x6dd0 correctly accounts for the
extra 8 bytes, so the frame pointer is valid. So it is our assumption
that there are no gaps between the stack frames is invalid.


Thanks for the assistance. The gap between the stack frame depends on
the function. Most do not have a gap. Some have 8 (as shown above), some
have 12. A single assumption here is not going to work. I'm having a
hard time finding out the reasoning for this gap. I tried disabling a
bunch of gcc flags as well as -O2 and the gap still exists.


That code was originally added because of some strange things that gcc did
with mcount (for example, it made a copy of the stack frame that it passed
to mcount, where the function graph tracer replaced the copy of the return
stack making the shadow stack go out of sync and crash). This was very hard
to debug and I added this code to detect it if it happened again.

Well it's been over a decade since that happened (2009).

 71e308a239c09 ("function-graph: add stack frame test")

I'm happy assuming that the compiler folks are aware of our tricks with
hijacking return calls and I don't expect it to happen again. We can just
rip out those checks. That is, if it's only causing false positives, I
don't think it's worth keeping around.

Has it detected any real issues on the Arm platforms?

-- Steve


I am not familiar enough to make a call. But from my limited testing
with ARM, I didn't see any issues. If you would like me to, I can submit
a patch to remove the check entirely. Or maybe only disable it for ARM?



Please try the fix I proposed first.


Just tested it. Seems to do the trick.


Thanks


Either solution works for me.



Given that this discussion is taking place in the context of the
report of an issue identified by GRAPH_FP_TEST, I don't see how
removing that would be a reasonable conclusion.



Fair enough. I will submit a patch. Thanks for the help.


FWIW I also experimented with LLVM, looks like function_graph just
crashes regardless of the issue being discussed. The disassemble of
LLVM[1] does something completely different.




LLVM does not support CONFIG_UNWINDER_FRAME_POINTER so the fact that
the prologue looks different is expected.

In the case below, the FP {r11} is correctly made to point to a {r11,
lr} tuple on the stack, so the codegen is correct AFAICT. But IIRC we
rely on unwind information for ftrace in this case, not the frame
pointer.

Could you be more specific about the crash?



It just hangs with no prints. I can instrument the hang and see what I 
can find.


Justin





[1]
LLVM dump
c0c6faa0 :
c0c6faa0: f0 4f 2d e9   push{r4, r5, r6, r7, r8, r9, r10, r11, lr}
c0c6faa4: 1c b0 8d e2   add r11, sp, #28
c0c6faa8: ac d0 4d e2   sub sp, sp, #172
c0c6faac: 00 70 a0 e1   mov r7, r0
c0c6fab0: c8 0c 04 e3   movwr0, #19656
c0c6fab4: 80 02 4c e3   movtr0, #49792
c0c6fab8: 03 50 a0 e1   mov r5, r3
c0c6fabc: 00 00 90 e5   ldr r0, [r0]
c0c6fac0: 02 a0 a0 e1   mov r10, r2
c0c6fac4: 20 00 0b e5   str r0, [r11, #-32]
c0c6fac8: 00 40 2d e9   stmdb   sp!, {lr}
c0c6facc: 4b 8b d6 eb   bl  0xc0212800 <__gnu_mcount_nc> @ imm =
#-10867412


smime.p7s
Description: S/MIME Cryptographic Signature


Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-02 Thread Ard Biesheuvel
On Sat, 2 Dec 2023 at 09:49, Justin Chen  wrote:
>
>
>
> On 12/1/23 10:53 PM, Ard Biesheuvel wrote:
> > On Fri, 1 Dec 2023 at 23:59, Justin Chen  wrote:
> >>
> >>
> >>
> >> On 12/1/23 10:07 AM, Steven Rostedt wrote:
> >>> On Fri, 1 Dec 2023 09:25:59 -0800
> >>> Justin Chen  wrote:
> >>>
> > It appears the sub instruction at 0x6dd0 correctly accounts for the
> > extra 8 bytes, so the frame pointer is valid. So it is our assumption
> > that there are no gaps between the stack frames is invalid.
> 
>  Thanks for the assistance. The gap between the stack frame depends on
>  the function. Most do not have a gap. Some have 8 (as shown above), some
>  have 12. A single assumption here is not going to work. I'm having a
>  hard time finding out the reasoning for this gap. I tried disabling a
>  bunch of gcc flags as well as -O2 and the gap still exists.
> >>>
> >>> That code was originally added because of some strange things that gcc did
> >>> with mcount (for example, it made a copy of the stack frame that it passed
> >>> to mcount, where the function graph tracer replaced the copy of the return
> >>> stack making the shadow stack go out of sync and crash). This was very 
> >>> hard
> >>> to debug and I added this code to detect it if it happened again.
> >>>
> >>> Well it's been over a decade since that happened (2009).
> >>>
> >>> 71e308a239c09 ("function-graph: add stack frame test")
> >>>
> >>> I'm happy assuming that the compiler folks are aware of our tricks with
> >>> hijacking return calls and I don't expect it to happen again. We can just
> >>> rip out those checks. That is, if it's only causing false positives, I
> >>> don't think it's worth keeping around.
> >>>
> >>> Has it detected any real issues on the Arm platforms?
> >>>
> >>> -- Steve
> >>
> >> I am not familiar enough to make a call. But from my limited testing
> >> with ARM, I didn't see any issues. If you would like me to, I can submit
> >> a patch to remove the check entirely. Or maybe only disable it for ARM?
> >>
> >
> > Please try the fix I proposed first.
>
> Just tested it. Seems to do the trick.

Thanks

> Either solution works for me.
>

Given that this discussion is taking place in the context of the
report of an issue identified by GRAPH_FP_TEST, I don't see how
removing that would be a reasonable conclusion.

> FWIW I also experimented with LLVM, looks like function_graph just
> crashes regardless of the issue being discussed. The disassemble of
> LLVM[1] does something completely different.
>


LLVM does not support CONFIG_UNWINDER_FRAME_POINTER so the fact that
the prologue looks different is expected.

In the case below, the FP {r11} is correctly made to point to a {r11,
lr} tuple on the stack, so the codegen is correct AFAICT. But IIRC we
rely on unwind information for ftrace in this case, not the frame
pointer.

Could you be more specific about the crash?


>
> [1]
> LLVM dump
> c0c6faa0 :
> c0c6faa0: f0 4f 2d e9   push{r4, r5, r6, r7, r8, r9, r10, r11, lr}
> c0c6faa4: 1c b0 8d e2   add r11, sp, #28
> c0c6faa8: ac d0 4d e2   sub sp, sp, #172
> c0c6faac: 00 70 a0 e1   mov r7, r0
> c0c6fab0: c8 0c 04 e3   movwr0, #19656
> c0c6fab4: 80 02 4c e3   movtr0, #49792
> c0c6fab8: 03 50 a0 e1   mov r5, r3
> c0c6fabc: 00 00 90 e5   ldr r0, [r0]
> c0c6fac0: 02 a0 a0 e1   mov r10, r2
> c0c6fac4: 20 00 0b e5   str r0, [r11, #-32]
> c0c6fac8: 00 40 2d e9   stmdb   sp!, {lr}
> c0c6facc: 4b 8b d6 eb   bl  0xc0212800 <__gnu_mcount_nc> @ imm =
> #-10867412



Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-02 Thread Justin Chen



On 12/1/23 10:53 PM, Ard Biesheuvel wrote:

On Fri, 1 Dec 2023 at 23:59, Justin Chen  wrote:




On 12/1/23 10:07 AM, Steven Rostedt wrote:

On Fri, 1 Dec 2023 09:25:59 -0800
Justin Chen  wrote:


It appears the sub instruction at 0x6dd0 correctly accounts for the
extra 8 bytes, so the frame pointer is valid. So it is our assumption
that there are no gaps between the stack frames is invalid.


Thanks for the assistance. The gap between the stack frame depends on
the function. Most do not have a gap. Some have 8 (as shown above), some
have 12. A single assumption here is not going to work. I'm having a
hard time finding out the reasoning for this gap. I tried disabling a
bunch of gcc flags as well as -O2 and the gap still exists.


That code was originally added because of some strange things that gcc did
with mcount (for example, it made a copy of the stack frame that it passed
to mcount, where the function graph tracer replaced the copy of the return
stack making the shadow stack go out of sync and crash). This was very hard
to debug and I added this code to detect it if it happened again.

Well it's been over a decade since that happened (2009).

71e308a239c09 ("function-graph: add stack frame test")

I'm happy assuming that the compiler folks are aware of our tricks with
hijacking return calls and I don't expect it to happen again. We can just
rip out those checks. That is, if it's only causing false positives, I
don't think it's worth keeping around.

Has it detected any real issues on the Arm platforms?

-- Steve


I am not familiar enough to make a call. But from my limited testing
with ARM, I didn't see any issues. If you would like me to, I can submit
a patch to remove the check entirely. Or maybe only disable it for ARM?



Please try the fix I proposed first.


Just tested it. Seems to do the trick. Either solution works for me.

FWIW I also experimented with LLVM, looks like function_graph just 
crashes regardless of the issue being discussed. The disassemble of 
LLVM[1] does something completely different.


Thanks,
Justin

[1]
LLVM dump
c0c6faa0 :
c0c6faa0: f0 4f 2d e9   push{r4, r5, r6, r7, r8, r9, r10, r11, lr}
c0c6faa4: 1c b0 8d e2   add r11, sp, #28
c0c6faa8: ac d0 4d e2   sub sp, sp, #172
c0c6faac: 00 70 a0 e1   mov r7, r0
c0c6fab0: c8 0c 04 e3   movwr0, #19656
c0c6fab4: 80 02 4c e3   movtr0, #49792
c0c6fab8: 03 50 a0 e1   mov r5, r3
c0c6fabc: 00 00 90 e5   ldr r0, [r0]
c0c6fac0: 02 a0 a0 e1   mov r10, r2
c0c6fac4: 20 00 0b e5   str r0, [r11, #-32]
c0c6fac8: 00 40 2d e9   stmdb   sp!, {lr}
c0c6facc: 4b 8b d6 eb   bl  0xc0212800 <__gnu_mcount_nc> @ imm = 
#-10867412


smime.p7s
Description: S/MIME Cryptographic Signature


Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-01 Thread Ard Biesheuvel
On Fri, 1 Dec 2023 at 23:59, Justin Chen  wrote:
>
>
>
> On 12/1/23 10:07 AM, Steven Rostedt wrote:
> > On Fri, 1 Dec 2023 09:25:59 -0800
> > Justin Chen  wrote:
> >
> >>> It appears the sub instruction at 0x6dd0 correctly accounts for the
> >>> extra 8 bytes, so the frame pointer is valid. So it is our assumption
> >>> that there are no gaps between the stack frames is invalid.
> >>
> >> Thanks for the assistance. The gap between the stack frame depends on
> >> the function. Most do not have a gap. Some have 8 (as shown above), some
> >> have 12. A single assumption here is not going to work. I'm having a
> >> hard time finding out the reasoning for this gap. I tried disabling a
> >> bunch of gcc flags as well as -O2 and the gap still exists.
> >
> > That code was originally added because of some strange things that gcc did
> > with mcount (for example, it made a copy of the stack frame that it passed
> > to mcount, where the function graph tracer replaced the copy of the return
> > stack making the shadow stack go out of sync and crash). This was very hard
> > to debug and I added this code to detect it if it happened again.
> >
> > Well it's been over a decade since that happened (2009).
> >
> >71e308a239c09 ("function-graph: add stack frame test")
> >
> > I'm happy assuming that the compiler folks are aware of our tricks with
> > hijacking return calls and I don't expect it to happen again. We can just
> > rip out those checks. That is, if it's only causing false positives, I
> > don't think it's worth keeping around.
> >
> > Has it detected any real issues on the Arm platforms?
> >
> > -- Steve
>
> I am not familiar enough to make a call. But from my limited testing
> with ARM, I didn't see any issues. If you would like me to, I can submit
> a patch to remove the check entirely. Or maybe only disable it for ARM?
>

Please try the fix I proposed first.



Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-01 Thread Justin Chen



On 12/1/23 10:07 AM, Steven Rostedt wrote:

On Fri, 1 Dec 2023 09:25:59 -0800
Justin Chen  wrote:


It appears the sub instruction at 0x6dd0 correctly accounts for the
extra 8 bytes, so the frame pointer is valid. So it is our assumption
that there are no gaps between the stack frames is invalid.


Thanks for the assistance. The gap between the stack frame depends on
the function. Most do not have a gap. Some have 8 (as shown above), some
have 12. A single assumption here is not going to work. I'm having a
hard time finding out the reasoning for this gap. I tried disabling a
bunch of gcc flags as well as -O2 and the gap still exists.


That code was originally added because of some strange things that gcc did
with mcount (for example, it made a copy of the stack frame that it passed
to mcount, where the function graph tracer replaced the copy of the return
stack making the shadow stack go out of sync and crash). This was very hard
to debug and I added this code to detect it if it happened again.

Well it's been over a decade since that happened (2009).

   71e308a239c09 ("function-graph: add stack frame test")

I'm happy assuming that the compiler folks are aware of our tricks with
hijacking return calls and I don't expect it to happen again. We can just
rip out those checks. That is, if it's only causing false positives, I
don't think it's worth keeping around.

Has it detected any real issues on the Arm platforms?

-- Steve


I am not familiar enough to make a call. But from my limited testing 
with ARM, I didn't see any issues. If you would like me to, I can submit 
a patch to remove the check entirely. Or maybe only disable it for ARM?


Thanks,
Justin


smime.p7s
Description: S/MIME Cryptographic Signature


Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-01 Thread Russell King (Oracle)
On Fri, Dec 01, 2023 at 10:12:48AM +0100, Ard Biesheuvel wrote:
> It appears the sub instruction at 0x6dd0 correctly accounts for the
> extra 8 bytes, so the frame pointer is valid. So it is our assumption
> that there are no gaps between the stack frames is invalid.
> 
> Could you try the following change please?
> 
> --- a/arch/arm/kernel/ftrace.c
> +++ b/arch/arm/kernel/ftrace.c
> @@ -235,8 +235,12 @@
> return;
> 
> if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
> -   /* FP points one word below parent's top of stack */
> -   frame_pointer += 4;
> +   /*
> +* The top of stack of the parent is recorded in the stack
> +* frame at offset [fp, #-8].
> +*/
> +   get_kernel_nofault(frame_pointer,
> +  (unsigned long *)(frame_pointer - 8));

Yes, this will get the value of the stack pointer when the function
was entered - which may be the bottom of the parent function's stack
_or_ the start of non-register arguments to this function. So your
replacement has always been more correct.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!



Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-01 Thread Steven Rostedt
On Fri, 1 Dec 2023 09:25:59 -0800
Justin Chen  wrote:

> > It appears the sub instruction at 0x6dd0 correctly accounts for the
> > extra 8 bytes, so the frame pointer is valid. So it is our assumption
> > that there are no gaps between the stack frames is invalid.  
> 
> Thanks for the assistance. The gap between the stack frame depends on 
> the function. Most do not have a gap. Some have 8 (as shown above), some 
> have 12. A single assumption here is not going to work. I'm having a 
> hard time finding out the reasoning for this gap. I tried disabling a 
> bunch of gcc flags as well as -O2 and the gap still exists.

That code was originally added because of some strange things that gcc did
with mcount (for example, it made a copy of the stack frame that it passed
to mcount, where the function graph tracer replaced the copy of the return
stack making the shadow stack go out of sync and crash). This was very hard
to debug and I added this code to detect it if it happened again.

Well it's been over a decade since that happened (2009).

  71e308a239c09 ("function-graph: add stack frame test")

I'm happy assuming that the compiler folks are aware of our tricks with
hijacking return calls and I don't expect it to happen again. We can just
rip out those checks. That is, if it's only causing false positives, I
don't think it's worth keeping around.

Has it detected any real issues on the Arm platforms?

-- Steve



Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-01 Thread Justin Chen



On 12/1/2023 1:12 AM, Ard Biesheuvel wrote:

On Fri, 1 Dec 2023 at 00:48, Justin Chen  wrote:


Hello,

Ran into an odd bug that I am unsure what the solution is. Tested a few
kernels versions and they all fail the same.

FUNCTION_GRAPH_FP_TEST was enabled with 953f534a7ed6 ("ARM: ftrace:
enable HAVE_FUNCTION_GRAPH_FP_TEST"). This test fails when
UNWINDER_FRAME_POINTER is enabled. Enable function_graph tracer and you
should see a failure similar to below.

[   63.817239] [ cut here ]
[   63.822006] WARNING: CPU: 3 PID: 1185 at kernel/trace/fgraph.c:195
ftrace_return_to_handler+0x228/0x374
[   63.831645] Bad frame pointer: expected d1e0df40, received d1e0df48
[   63.831645]   from func packet_setsockopt return to c0b558f4
[   63.843801] Modules linked in: bdc udc_core
[   63.848246] CPU: 3 PID: 1185 Comm: udhcpc Not tainted
6.1.53-0.1pre-gf0bc552d12f8 #33
[   63.856209] Hardware name: Broadcom STB (Flattened Device Tree)
[   63.862227] Backtrace:
[   63.864761]  dump_backtrace from show_stack+0x20/0x24
[   63.869982]  r7:c031cd8c r6:0009 r5:0013 r4:c11c7fac
[   63.875736]  show_stack from dump_stack_lvl+0x48/0x54
[   63.880929]  dump_stack_lvl from dump_stack+0x18/0x1c
[   63.886111]  r5:00c3 r4:c11bd92c
[   63.889764]  dump_stack from __warn+0x88/0x130
[   63.894339]  __warn from warn_slowpath_fmt+0x140/0x198
[   63.899631]  r8:d1e0deac r7:c11bd958 r6:c031cd8c r5:c11bd92c r4:
[   63.906431]  warn_slowpath_fmt from ftrace_return_to_handler+0x228/0x374
[   63.913294]  r8:c3a8d840 r7:0002 r6:d1e0df48 r5:c2377a94 r4:c269a400
[   63.920095]  ftrace_return_to_handler from return_to_handler+0xc/0x18
[   63.926699]  r8:c0cd8ed0 r7:0008 r6:c418c500 r5:0004 r4:0107
[   63.933500]  __sys_setsockopt from return_to_handler+0x0/0x18
[   63.939415]  r8:c02002bc r7:0126 r6:0003 r5: r4:0004
[   63.946217]  sys_setsockopt from return_to_handler+0x0/0x18
[   63.952053] ---[ end trace  ]---

Sure enough the top of the parent stack is off by 8. (Tested with
gcc6.3/gcc8.3/gcc12.3)
6dcc :
  6dcc:   e1a0c00dmov ip, sp
  6dd0:   e24dd008sub sp, sp, #8 <==
  6dd4:   e92ddff0push{r4, r5, r6, r7, r8, r9, sl,
fp, ip, lr, pc}
  6dd8:   e24cb00csub fp, ip, #12
  6ddc:   e24dd06csub sp, sp, #108@ 0x6c
  6de0:   e52de004push{lr}@ (str lr, [sp,
#-4]!)
  6de4:   ebfebl  0 <__gnu_mcount_nc>

I'm not quite sure why gcc is putting this extra 8 byte frame (maybe
some optimization?), but it isn't being accounted for thus the
FUNCTION_GRAPH_FP_TEST for arm fails. Note that only some functions do
this. Function graph works with FUNCTION_GRAPH_FP_TEST disabled, so it
looks the test is hitting false positives.



Thanks for the report.

It appears the sub instruction at 0x6dd0 correctly accounts for the
extra 8 bytes, so the frame pointer is valid. So it is our assumption
that there are no gaps between the stack frames is invalid.


Thanks for the assistance. The gap between the stack frame depends on 
the function. Most do not have a gap. Some have 8 (as shown above), some 
have 12. A single assumption here is not going to work. I'm having a 
hard time finding out the reasoning for this gap. I tried disabling a 
bunch of gcc flags as well as -O2 and the gap still exists.


Thanks,
Justin



Could you try the following change please?

--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -235,8 +235,12 @@
 return;

 if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
-   /* FP points one word below parent's top of stack */
-   frame_pointer += 4;
+   /*
+* The top of stack of the parent is recorded in the stack
+* frame at offset [fp, #-8].
+*/
+   get_kernel_nofault(frame_pointer,
+  (unsigned long *)(frame_pointer - 8));
 } else {
 struct stackframe frame = {
 .fp = frame_pointer,


smime.p7s
Description: S/MIME Cryptographic Signature


Re: ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-12-01 Thread Ard Biesheuvel
On Fri, 1 Dec 2023 at 00:48, Justin Chen  wrote:
>
> Hello,
>
> Ran into an odd bug that I am unsure what the solution is. Tested a few
> kernels versions and they all fail the same.
>
> FUNCTION_GRAPH_FP_TEST was enabled with 953f534a7ed6 ("ARM: ftrace:
> enable HAVE_FUNCTION_GRAPH_FP_TEST"). This test fails when
> UNWINDER_FRAME_POINTER is enabled. Enable function_graph tracer and you
> should see a failure similar to below.
>
> [   63.817239] [ cut here ]
> [   63.822006] WARNING: CPU: 3 PID: 1185 at kernel/trace/fgraph.c:195
> ftrace_return_to_handler+0x228/0x374
> [   63.831645] Bad frame pointer: expected d1e0df40, received d1e0df48
> [   63.831645]   from func packet_setsockopt return to c0b558f4
> [   63.843801] Modules linked in: bdc udc_core
> [   63.848246] CPU: 3 PID: 1185 Comm: udhcpc Not tainted
> 6.1.53-0.1pre-gf0bc552d12f8 #33
> [   63.856209] Hardware name: Broadcom STB (Flattened Device Tree)
> [   63.862227] Backtrace:
> [   63.864761]  dump_backtrace from show_stack+0x20/0x24
> [   63.869982]  r7:c031cd8c r6:0009 r5:0013 r4:c11c7fac
> [   63.875736]  show_stack from dump_stack_lvl+0x48/0x54
> [   63.880929]  dump_stack_lvl from dump_stack+0x18/0x1c
> [   63.886111]  r5:00c3 r4:c11bd92c
> [   63.889764]  dump_stack from __warn+0x88/0x130
> [   63.894339]  __warn from warn_slowpath_fmt+0x140/0x198
> [   63.899631]  r8:d1e0deac r7:c11bd958 r6:c031cd8c r5:c11bd92c r4:
> [   63.906431]  warn_slowpath_fmt from ftrace_return_to_handler+0x228/0x374
> [   63.913294]  r8:c3a8d840 r7:0002 r6:d1e0df48 r5:c2377a94 r4:c269a400
> [   63.920095]  ftrace_return_to_handler from return_to_handler+0xc/0x18
> [   63.926699]  r8:c0cd8ed0 r7:0008 r6:c418c500 r5:0004 r4:0107
> [   63.933500]  __sys_setsockopt from return_to_handler+0x0/0x18
> [   63.939415]  r8:c02002bc r7:0126 r6:0003 r5: r4:0004
> [   63.946217]  sys_setsockopt from return_to_handler+0x0/0x18
> [   63.952053] ---[ end trace  ]---
>
> Sure enough the top of the parent stack is off by 8. (Tested with
> gcc6.3/gcc8.3/gcc12.3)
> 6dcc :
>  6dcc:   e1a0c00dmov ip, sp
>  6dd0:   e24dd008sub sp, sp, #8 <==
>  6dd4:   e92ddff0push{r4, r5, r6, r7, r8, r9, sl,
> fp, ip, lr, pc}
>  6dd8:   e24cb00csub fp, ip, #12
>  6ddc:   e24dd06csub sp, sp, #108@ 0x6c
>  6de0:   e52de004push{lr}@ (str lr, [sp,
> #-4]!)
>  6de4:   ebfebl  0 <__gnu_mcount_nc>
>
> I'm not quite sure why gcc is putting this extra 8 byte frame (maybe
> some optimization?), but it isn't being accounted for thus the
> FUNCTION_GRAPH_FP_TEST for arm fails. Note that only some functions do
> this. Function graph works with FUNCTION_GRAPH_FP_TEST disabled, so it
> looks the test is hitting false positives.
>

Thanks for the report.

It appears the sub instruction at 0x6dd0 correctly accounts for the
extra 8 bytes, so the frame pointer is valid. So it is our assumption
that there are no gaps between the stack frames is invalid.

Could you try the following change please?

--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -235,8 +235,12 @@
return;

if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER)) {
-   /* FP points one word below parent's top of stack */
-   frame_pointer += 4;
+   /*
+* The top of stack of the parent is recorded in the stack
+* frame at offset [fp, #-8].
+*/
+   get_kernel_nofault(frame_pointer,
+  (unsigned long *)(frame_pointer - 8));
} else {
struct stackframe frame = {
.fp = frame_pointer,



ARM Ftrace Function Graph Fails With UNWINDER_FRAME_POINTER

2023-11-30 Thread Justin Chen

Hello,

Ran into an odd bug that I am unsure what the solution is. Tested a few 
kernels versions and they all fail the same.


FUNCTION_GRAPH_FP_TEST was enabled with 953f534a7ed6 ("ARM: ftrace: 
enable HAVE_FUNCTION_GRAPH_FP_TEST"). This test fails when 
UNWINDER_FRAME_POINTER is enabled. Enable function_graph tracer and you 
should see a failure similar to below.


[   63.817239] [ cut here ]
[   63.822006] WARNING: CPU: 3 PID: 1185 at kernel/trace/fgraph.c:195 
ftrace_return_to_handler+0x228/0x374

[   63.831645] Bad frame pointer: expected d1e0df40, received d1e0df48
[   63.831645]   from func packet_setsockopt return to c0b558f4
[   63.843801] Modules linked in: bdc udc_core
[   63.848246] CPU: 3 PID: 1185 Comm: udhcpc Not tainted 
6.1.53-0.1pre-gf0bc552d12f8 #33

[   63.856209] Hardware name: Broadcom STB (Flattened Device Tree)
[   63.862227] Backtrace:
[   63.864761]  dump_backtrace from show_stack+0x20/0x24
[   63.869982]  r7:c031cd8c r6:0009 r5:0013 r4:c11c7fac
[   63.875736]  show_stack from dump_stack_lvl+0x48/0x54
[   63.880929]  dump_stack_lvl from dump_stack+0x18/0x1c
[   63.886111]  r5:00c3 r4:c11bd92c
[   63.889764]  dump_stack from __warn+0x88/0x130
[   63.894339]  __warn from warn_slowpath_fmt+0x140/0x198
[   63.899631]  r8:d1e0deac r7:c11bd958 r6:c031cd8c r5:c11bd92c r4:
[   63.906431]  warn_slowpath_fmt from ftrace_return_to_handler+0x228/0x374
[   63.913294]  r8:c3a8d840 r7:0002 r6:d1e0df48 r5:c2377a94 r4:c269a400
[   63.920095]  ftrace_return_to_handler from return_to_handler+0xc/0x18
[   63.926699]  r8:c0cd8ed0 r7:0008 r6:c418c500 r5:0004 r4:0107
[   63.933500]  __sys_setsockopt from return_to_handler+0x0/0x18
[   63.939415]  r8:c02002bc r7:0126 r6:0003 r5: r4:0004
[   63.946217]  sys_setsockopt from return_to_handler+0x0/0x18
[   63.952053] ---[ end trace  ]---

Sure enough the top of the parent stack is off by 8. (Tested with 
gcc6.3/gcc8.3/gcc12.3)

6dcc :
    6dcc:   e1a0c00d    mov ip, sp
    6dd0:   e24dd008    sub sp, sp, #8 <==
    6dd4:   e92ddff0    push    {r4, r5, r6, r7, r8, r9, sl, 
fp, ip, lr, pc}

    6dd8:   e24cb00c    sub fp, ip, #12
    6ddc:   e24dd06c    sub sp, sp, #108    @ 0x6c
    6de0:   e52de004    push    {lr}    @ (str lr, [sp, 
#-4]!)

    6de4:   ebfe    bl  0 <__gnu_mcount_nc>

I'm not quite sure why gcc is putting this extra 8 byte frame (maybe 
some optimization?), but it isn't being accounted for thus the 
FUNCTION_GRAPH_FP_TEST for arm fails. Note that only some functions do 
this. Function graph works with FUNCTION_GRAPH_FP_TEST disabled, so it 
looks the test is hitting false positives.


Thanks,
Justin


smime.p7s
Description: S/MIME Cryptographic Signature