[linux-sunxi] Occasional crash on Cubieboard 2

2022-08-25 Thread Torsten Beyer
Hi all,

I am running a 5.19.1 kernel on a cubieboard and occasionally get the crash 
below. Does anyone here have any pointers for me as to what may be going 
wrong?

thanks
Torsten
[ 5826.086429] Unable to handle kernel paging request at virtual address 
e975ebf8
[ 5826.093658] [e975ebf8] *pgd=6961141e(bad)
[ 5826.097697] Internal error: Oops: 800d [#1 
] SMP ARM
[ 5826.102930] Modules linked in:
[ 5826.105996] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.19.1 #1 

[ 5826.112186] Hardware name: Allwinner sun7i (A20) Family
[ 5826.117414] Workqueue: phy0 0xe975ebf8
[ 5826.121188] PC is at 0xe975ebf8
[ 5826.124338] LR is at process_one_work+0x1d8/0x3e4
[ 5826.129068] pc : [] lr : [] psr: 2013
[ 5826.135336] sp : f0835f30 ip :  fp : c1062e80
[ 5826.140564] r10: c1008e05 r9 : 0080 r8 : 
[ 5826.145792] r7 : c1008e00 r6 : c1006200 r5 : c100b900 r4 : c1b01d94
[ 5826.152323] r3 : e975ebf8 r2 : c100621c r1 : c1371a64 r0 : c1b01d94
[ 5826.158855] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 5826.165996] Control: 10c5387d Table: 421c806a DAC: 0051
[ 5826.171742] Register r0 information: slab kmalloc-8k start c1b0 
pointer offset 7572 size 8192
[ 5826.180655] Register r1 information: slab kmalloc-512 start c1371a00 
pointer offset 100 size 512
[ 5826.189472] Register r2 information: slab kmalloc-512 start c1006200 
pointer offset 28 size 512
[ 5826.198203] Register r3 information: non-slab/vmalloc memory
[ 5826.203873] Register r4 information: slab kmalloc-8k start c1b0 
pointer offset 7572 size 8192
[ 5826.212777] Register r5 information: slab kmalloc-128 start c100b900 
pointer offset 0 size 128
[ 5826.221421] Register r6 information: slab kmalloc-512 start c1006200 
pointer offset 0 size 512
[ 5826.230064] Register r7 information: slab pool_workqueue start c1008e00 
pointer offset 0
[ 5826.238180] Register r8 information: NULL pointer
[ 5826.242892] Register r9 information: non-paged memory
[ 5826.247951] Register r10 information: slab pool_workqueue start c1008e00 
pointer offset 5
[ 5826.256154] Register r11 information: slab task_struct start c1062e80 
pointer offset 0
[ 5826.264098] Register r12 information: NULL pointer
[ 5826.268896] Process kworker/u4:0 (pid: 7, stack limit = 0x712cd2c2)
[ 5826.275191] Stack: (0xf0835f30 to 0xf0836000)
[ 5826.279561] 5f20: c1006200 c1006200 c100621c c100b900
[ 5826.287745] 5f40: c1006200 c100b918 c100621c c0d03d40 0088 c1062e80 
c1006200 c013248c
[ 5826.295930] 5f60: c100b900 c0dbfb67 f081de8c c103f080 c1062e80 c0132434 
c100b900 c103f440
[ 5826.304114] 5f80: f081de8c   c0139a2c c103f080 c013995c 
 
[ 5826.312297] 5fa0:    c0100148   
 
[ 5826.320480] 5fc0:       
 
[ 5826.328663] 5fe0:     0013  
 
[ 5826.336856] process_one_work from worker_thread+0x58/0x54c
[ 5826.342460] worker_thread from kthread+0xd0/0xec
[ 5826.347191] kthread from ret_from_fork+0x14/0x2c
[ 5826.351916] Exception stack(0xf0835fb0 to 0xf0835ff8)
[ 5826.356976] 5fa0:    
[ 5826.365159] 5fc0:       
 
[ 5826.373340] 5fe0:     0013 
[ 5826.379965] Code: fafffdff ffef fbf7ef97 fffd ()
[ 5826.386065] ---[ end trace  ]---
[ 5832.033309] [drm:lima_sched_timedout_job] *ERROR* lima job timeout
[ 5846.203275] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 5846.209243] (detected by 0, t=2102 jiffies, g=469341, q=181 ncpus=2)
[ 5846.215697] rcu: All QSes seen, last rcu_sched kthread activity 2102 
(554620-552518), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 5846.227262] rcu: rcu_sched kthread timer wakeup didn't happen for 2101 
jiffies! g469341 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200
[ 5846.238568] rcu: Possible timer handling issue on cpu=1 
timer-softirq=113072
[ 5846.245703] rcu: rcu_sched kthread starved for 2102 jiffies! g469341 
f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=1
[ 5846.256056] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM 
is now expected behavior.
[ 5846.265009] rcu: RCU grace-period kthread stack dump:
[ 5846.270060] task:rcu_sched state:R stack: 7212 pid: 10 ppid: 2 
flags:0x
[ 5846.278457] __schedule from schedule+0x48/0xb0
[ 5846.283023] schedule from schedule_timeout+0x6c/0xcc
[ 5846.288097] schedule_timeout from rcu_gp_fqs_loop+0xf4/0x2a8
[ 5846.293868] rcu_gp_fqs_loop from rcu_gp_kthread+0xf0/0x10c
[ 5846.299459] rcu_gp_kthread from kthread+0xd0/0xec
[ 5846.304275] kthread from ret_from_fork+0x14/0x2c
[ 5846.309000] Exception stack(0xf0841fb0 to 0xf0841ff8)
[ 5846.314062] 1fa0:    
[ 

Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-22 Thread Torsten Beyer
Hi Samuel,

tried setting OPP voltages in 50mV increments. That doesn't seem to work. 
Shouldn't /sys/class/regulator/regulator.9/microvolts show the actual 
microvolts supplied to the CPU? When setting the lowest value to 1.15V I 
never see a value lower than 1.2V in the above. Either I am looking to the 
wrong part or the regulator is rounding the value I would say. Any comments?

regards
Torsten

sam...@sholland.org schrieb am Mittwoch, 20. Juli 2022 um 05:24:27 UTC+2:

> Hi Torsten,
>
> On 7/19/22 3:44 AM, Torsten Beyer wrote:
> > fantastic - I have built a kernel with changed setting yesterday 
> afternoon
> > (increased min OPP to 1.1V) and the system has been running for about 
> 20hrs now
> > without freezes. Thanks for your help - looks like this patch fixes it. 
>
> Thanks for the confirmation. The voltage regulator supplying the CPU 
> (reg_dcdc2)
> has a 25 mV step size, so you could see if a smaller change to the OPP is 
> enough
> to make the board stable.
>
> I assume you are powering the board with a reasonably stable power supply? 
> In
> that case, it would be good to apply your change upstream, in case any 
> other
> Cubieboard 2 users are experiencing crashes. If you want to submit a 
> patch, I
> would suggest overriding the OPP table in the board-specific DTS. See 
> here[1]
> for an example of a board that does this.
>
> Regards,
> Samuel
>
> [1]:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/sun7i-a20-bananapi.dts#n105
>

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/linux-sunxi/37d78b1c-92da-42c6-b95a-9a756e8713bbn%40googlegroups.com.


Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-19 Thread Torsten Beyer
Hi Samuel,

thanks for the additional hints. Will try to follow (this is the first time 
I am poking around in a unix kernel since 4.3bsd ... so I'm a bit rusty 
still). The power supply is suitably stable. I will try and experiment with 
lower voltages and see what happens. I may run into questions and would 
appreciate your further support in case I need it.

cheers
Torsten

sam...@sholland.org schrieb am Mittwoch, 20. Juli 2022 um 05:24:27 UTC+2:

> Hi Torsten,
>
> On 7/19/22 3:44 AM, Torsten Beyer wrote:
> > fantastic - I have built a kernel with changed setting yesterday 
> afternoon
> > (increased min OPP to 1.1V) and the system has been running for about 
> 20hrs now
> > without freezes. Thanks for your help - looks like this patch fixes it. 
>
> Thanks for the confirmation. The voltage regulator supplying the CPU 
> (reg_dcdc2)
> has a 25 mV step size, so you could see if a smaller change to the OPP is 
> enough
> to make the board stable.
>
> I assume you are powering the board with a reasonably stable power supply? 
> In
> that case, it would be good to apply your change upstream, in case any 
> other
> Cubieboard 2 users are experiencing crashes. If you want to submit a 
> patch, I
> would suggest overriding the OPP table in the board-specific DTS. See 
> here[1]
> for an example of a board that does this.
>
> Regards,
> Samuel
>
> [1]:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/sun7i-a20-bananapi.dts#n105
>

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/linux-sunxi/5d16187c-ea62-4c49-9668-3a86ee6eee12n%40googlegroups.com.


Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-19 Thread Torsten Beyer
Hi Samuel,

fantastic - I have built a kernel with changed setting yesterday afternoon 
(increased min OPP to 1.1V) and the system has been running for about 20hrs 
now without freezes. Thanks for your help - looks like this patch fixes it. 

cheers
Torsten

sam...@sholland.org schrieb am Dienstag, 19. Juli 2022 um 04:13:48 UTC+2:

> Hi Torsten,
>
> On 7/18/22 7:13 AM, Torsten Beyer wrote:
> > Hi again,
> > 
> > I believe I found the place. Can you confirm that changing the 
> microvolts OPPs
> > in "arch/arm/boot/dts/sun7i-a20.dtsi" is the right place for upping the
> > microvolts for lower frequencies?
>
> Yes, in that file, in the operating-points property of the cpu nodes.
>
> Regards,
> Samuel
>

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/linux-sunxi/9a4a3380-8e54-494c-9bce-daa33a70ac0cn%40googlegroups.com.


Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-18 Thread Torsten Beyer
Hi again,

I believe I found the place. Can you confirm that changing the microvolts 
OPPs in "arch/arm/boot/dts/sun7i-a20.dtsi" is the right place for upping 
the microvolts for lower frequencies?

cheers
-tb

Torsten Beyer schrieb am Montag, 18. Juli 2022 um 09:26:58 UTC+2:

> Hi Samuel,
>
> am stuck trying to figure out how to increase the voltages. Can you point 
> me to some documentation or quickly explain how I would do that?
>
> regards
> Torsten
>
> sam...@sholland.org schrieb am Samstag, 16. Juli 2022 um 06:16:16 UTC+2:
>
>> Hi Torsten, 
>>
>> On 7/13/22 3:18 AM, Torsten Beyer wrote: 
>> > Hi all, 
>> > 
>> > I am trying to debug a bug on an open source air navigation box for 
>> > gliders called openvario <https://www.openvario.org/doku.php>. It is 
>> > based on a cubieboard (A20) plus some additional serial connections 
>> > and an optional sensor board for various flight related pressures. 
>> > 
>> > System runs on kernel 5.18.5 generated using Yocto 4.0 kirkstone. The 
>> > system tends to run for a couple of hours and then freezes/crashes. 
>> > At the bottom of this post I have pasted a typical kernel debug 
>> > output once these freezes happen. The crash always happens in the 
>> > cpu_freq driver. If I set cpu frequency to a fixed frequency (setting 
>> > min=max frequency) those crashed disappear. This seems to be a work 
>> > around at the cost of fixing cpu speed. 
>> > 
>> > So it _seems_ the crash is caused by cpu_freq trying to change the 
>> > cpu frequency (at least at some point in time). 
>> > 
>> > To be honest, I am rather clueless on how to go about finding the 
>> > root of this issue, let along fixing it. So I thought, I'd ask around 
>> > here whether this bug somehow looks familiar and may have been 
>> > tackled (or even fixed) previously (didn't find anything, though, via 
>> > the search function). In other words: I am thankful for any hint 
>> > people may be able to give me to get nearer to a fix.  
>>
>> I have not seen something like this before. It looks like hardware 
>> flakiness. Can you provide a disassembly of ccu_div_recalc_rate 
>> from the kernel this splat came from, to confirm my analysis? 
>>
>> > thanks for any pointers 
>> > Torsten 
>> > 
>> > [26996.004010] Unable to handle kernel paging request at virtual 
>> address 08d80050 
>> > [26996.011337] [08d80050] *pgd= 
>> > [26996.014952] Internal error: Oops: 5 [#1] SMP ARM 
>> > [26996.019590] Modules linked in: 
>> > [26996.022663] CPU: 1 PID: 95 Comm: sugov:0 Not tainted 5.18.5 #1 
>> > [26996.028509] Hardware name: Allwinner sun7i (A20) Family 
>> > [26996.033738] PC is at ccu_div_recalc_rate+0x48/0x90 
>> > [26996.038555] LR is at ccu_mux_helper_apply_prediv+0x18/0x1c 
>>
>> The crash is between the calls to ccu_mux_helper_apply_prediv and 
>> divider_recalc_rate, so we are loading arguments for the call to 
>> divider_recalc_rate. 
>>
>> > [26996.044054] pc : [] lr : [] psr: 600b0113 
>> > [26996.050326] sp : f09e5dc8 ip :  fp : c1938200 
>> > [26996.04] r10: c1867440 r9 : 1f78a400 r8 : c1302d00 
>> > [26996.060781] r7 : 1312d000 r6 : 1f78a400 r5 : 0002 r4 : 08d80084 
>>
>> Assuming r4 is "hw", then the faulting address is cd->div.flags. 
>> This is weird because r5 already contains cd->div.width... 
>>
>> > [26996.067311] r3 :  r2 :  r1 : 0001 r0 : 1f78a400 
>>
>> ..and r3 already contains cd->div.table. So we were already able 
>> to access parts of the struct both before and after the faulting 
>> address. 
>>
>> > [26996.073843] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment 
>> none 
>> > [26996.080985] Control: 10c5387d Table: 41ff006a DAC: 0051 
>> > [26996.086733] Register r0 information: non-paged memory 
>> > [26996.091799] Register r1 information: non-paged memory 
>> > [26996.096858] Register r2 information: non-paged memory 
>> > [26996.101915] Register r3 information: NULL pointer 
>> > [26996.106627] Register r4 information: non-paged memory 
>> > [26996.111688] Register r5 information: non-paged memory 
>> > [26996.116746] Register r6 information: non-paged memory 
>> > [26996.121805] Register r7 information: non-paged memory 
>> > [26996.126863] Register r8 information: slab kmalloc-128 start c1302d00 
>> pointer offset 0 size 128 
>> > [2

Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-18 Thread Torsten Beyer
Hi Samuel,

am stuck trying to figure out how to increase the voltages. Can you point 
me to some documentation or quickly explain how I would do that?

regards
Torsten

sam...@sholland.org schrieb am Samstag, 16. Juli 2022 um 06:16:16 UTC+2:

> Hi Torsten,
>
> On 7/13/22 3:18 AM, Torsten Beyer wrote:
> > Hi all,
> > 
> > I am trying to debug a bug on an open source air navigation box for
> > gliders called openvario <https://www.openvario.org/doku.php>. It is
> > based on a cubieboard (A20) plus some additional serial connections
> > and an optional sensor board for various flight related pressures.
> > 
> > System runs on kernel 5.18.5 generated using Yocto 4.0 kirkstone. The
> > system tends to run for a couple of hours and then freezes/crashes.
> > At the bottom of this post I have pasted a typical kernel debug
> > output once these freezes happen. The crash always happens in the
> > cpu_freq driver. If I set cpu frequency to a fixed frequency (setting
> > min=max frequency) those crashed disappear. This seems to be a work
> > around at the cost of fixing cpu speed.
> > 
> > So it _seems_ the crash is caused by cpu_freq trying to change the
> > cpu frequency (at least at some point in time).
> > 
> > To be honest, I am rather clueless on how to go about finding the
> > root of this issue, let along fixing it. So I thought, I'd ask around
> > here whether this bug somehow looks familiar and may have been
> > tackled (or even fixed) previously (didn't find anything, though, via
> > the search function). In other words: I am thankful for any hint
> > people may be able to give me to get nearer to a fix. 
>
> I have not seen something like this before. It looks like hardware
> flakiness. Can you provide a disassembly of ccu_div_recalc_rate
> from the kernel this splat came from, to confirm my analysis?
>
> > thanks for any pointers
> > Torsten
> > 
> > [26996.004010] Unable to handle kernel paging request at virtual address 
> 08d80050
> > [26996.011337] [08d80050] *pgd=
> > [26996.014952] Internal error: Oops: 5 [#1] SMP ARM
> > [26996.019590] Modules linked in:
> > [26996.022663] CPU: 1 PID: 95 Comm: sugov:0 Not tainted 5.18.5 #1
> > [26996.028509] Hardware name: Allwinner sun7i (A20) Family
> > [26996.033738] PC is at ccu_div_recalc_rate+0x48/0x90
> > [26996.038555] LR is at ccu_mux_helper_apply_prediv+0x18/0x1c
>
> The crash is between the calls to ccu_mux_helper_apply_prediv and
> divider_recalc_rate, so we are loading arguments for the call to
> divider_recalc_rate.
>
> > [26996.044054] pc : [] lr : [] psr: 600b0113
> > [26996.050326] sp : f09e5dc8 ip :  fp : c1938200
> > [26996.04] r10: c1867440 r9 : 1f78a400 r8 : c1302d00
> > [26996.060781] r7 : 1312d000 r6 : 1f78a400 r5 : 0002 r4 : 08d80084
>
> Assuming r4 is "hw", then the faulting address is cd->div.flags.
> This is weird because r5 already contains cd->div.width...
>
> > [26996.067311] r3 :  r2 :  r1 : 0001 r0 : 1f78a400
>
> ..and r3 already contains cd->div.table. So we were already able
> to access parts of the struct both before and after the faulting
> address.
>
> > [26996.073843] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment 
> none
> > [26996.080985] Control: 10c5387d Table: 41ff006a DAC: 0051
> > [26996.086733] Register r0 information: non-paged memory
> > [26996.091799] Register r1 information: non-paged memory
> > [26996.096858] Register r2 information: non-paged memory
> > [26996.101915] Register r3 information: NULL pointer
> > [26996.106627] Register r4 information: non-paged memory
> > [26996.111688] Register r5 information: non-paged memory
> > [26996.116746] Register r6 information: non-paged memory
> > [26996.121805] Register r7 information: non-paged memory
> > [26996.126863] Register r8 information: slab kmalloc-128 start c1302d00 
> pointer offset 0 size 128
> > [26996.135514] Register r9 information: non-paged memory
> > [26996.140574] Register r10 information: slab task_struct start c1867440 
> pointer offset 0
> > [26996.148517] Register r11 information: slab kmalloc-128 start c1938200 
> pointer offset 0 size 128
> > [26996.157244] Register r12 information: NULL pointer
> > [26996.162049] Process sugov:0 (pid: 95, stack limit = 0xf4bf205c)
> > [26996.167985] Stack: (0xf09e5dc8 to 0xf09e6000)
> > [26996.172361] 5dc0: c0d81584 c03db530  1f78a400 c1355700 
> c03d181c
>
> What I think is happening is that the value in r4 got corrupted from
> 0xc0d81584 (the saved value on the top of the stack) t

Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-16 Thread Torsten Beyer
Hi Samuel,

thanks for your insights. Will try to follow them and will report back 
here. 

In the meantime I have built a kernel with dynamic debug and I can see that 
cpu_freq and the associated calls shown in my earlier post must run 
millions of times. And then out of the blue a crash...so some hw flakiness 
came to my mind, too.

regards
Torsten

sam...@sholland.org schrieb am Samstag, 16. Juli 2022 um 06:16:16 UTC+2:

> Hi Torsten,
>
> On 7/13/22 3:18 AM, Torsten Beyer wrote:
> > Hi all,
> > 
> > I am trying to debug a bug on an open source air navigation box for
> > gliders called openvario <https://www.openvario.org/doku.php>. It is
> > based on a cubieboard (A20) plus some additional serial connections
> > and an optional sensor board for various flight related pressures.
> > 
> > System runs on kernel 5.18.5 generated using Yocto 4.0 kirkstone. The
> > system tends to run for a couple of hours and then freezes/crashes.
> > At the bottom of this post I have pasted a typical kernel debug
> > output once these freezes happen. The crash always happens in the
> > cpu_freq driver. If I set cpu frequency to a fixed frequency (setting
> > min=max frequency) those crashed disappear. This seems to be a work
> > around at the cost of fixing cpu speed.
> > 
> > So it _seems_ the crash is caused by cpu_freq trying to change the
> > cpu frequency (at least at some point in time).
> > 
> > To be honest, I am rather clueless on how to go about finding the
> > root of this issue, let along fixing it. So I thought, I'd ask around
> > here whether this bug somehow looks familiar and may have been
> > tackled (or even fixed) previously (didn't find anything, though, via
> > the search function). In other words: I am thankful for any hint
> > people may be able to give me to get nearer to a fix. 
>
> I have not seen something like this before. It looks like hardware
> flakiness. Can you provide a disassembly of ccu_div_recalc_rate
> from the kernel this splat came from, to confirm my analysis?
>
> > thanks for any pointers
> > Torsten
> > 
> > [26996.004010] Unable to handle kernel paging request at virtual address 
> 08d80050
> > [26996.011337] [08d80050] *pgd=
> > [26996.014952] Internal error: Oops: 5 [#1] SMP ARM
> > [26996.019590] Modules linked in:
> > [26996.022663] CPU: 1 PID: 95 Comm: sugov:0 Not tainted 5.18.5 #1
> > [26996.028509] Hardware name: Allwinner sun7i (A20) Family
> > [26996.033738] PC is at ccu_div_recalc_rate+0x48/0x90
> > [26996.038555] LR is at ccu_mux_helper_apply_prediv+0x18/0x1c
>
> The crash is between the calls to ccu_mux_helper_apply_prediv and
> divider_recalc_rate, so we are loading arguments for the call to
> divider_recalc_rate.
>
> > [26996.044054] pc : [] lr : [] psr: 600b0113
> > [26996.050326] sp : f09e5dc8 ip :  fp : c1938200
> > [26996.04] r10: c1867440 r9 : 1f78a400 r8 : c1302d00
> > [26996.060781] r7 : 1312d000 r6 : 1f78a400 r5 : 0002 r4 : 08d80084
>
> Assuming r4 is "hw", then the faulting address is cd->div.flags.
> This is weird because r5 already contains cd->div.width...
>
> > [26996.067311] r3 :  r2 :  r1 : 0001 r0 : 1f78a400
>
> ..and r3 already contains cd->div.table. So we were already able
> to access parts of the struct both before and after the faulting
> address.
>
> > [26996.073843] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment 
> none
> > [26996.080985] Control: 10c5387d Table: 41ff006a DAC: 0051
> > [26996.086733] Register r0 information: non-paged memory
> > [26996.091799] Register r1 information: non-paged memory
> > [26996.096858] Register r2 information: non-paged memory
> > [26996.101915] Register r3 information: NULL pointer
> > [26996.106627] Register r4 information: non-paged memory
> > [26996.111688] Register r5 information: non-paged memory
> > [26996.116746] Register r6 information: non-paged memory
> > [26996.121805] Register r7 information: non-paged memory
> > [26996.126863] Register r8 information: slab kmalloc-128 start c1302d00 
> pointer offset 0 size 128
> > [26996.135514] Register r9 information: non-paged memory
> > [26996.140574] Register r10 information: slab task_struct start c1867440 
> pointer offset 0
> > [26996.148517] Register r11 information: slab kmalloc-128 start c1938200 
> pointer offset 0 size 128
> > [26996.157244] Register r12 information: NULL pointer
> > [26996.162049] Process sugov:0 (pid: 95, stack limit = 0xf4bf205c)
> > [26996.167985] Stack: (0xf09e5dc8 to 0xf09e6000)
> > [26996.172361] 5dc0: c0d81584 c03db530 000

[linux-sunxi] Kernel crash in "cpu_freq"

2022-07-13 Thread Torsten Beyer
Hi all,

I am trying to debug a bug on an open source air navigation box for gliders 
called openvario . It is based on a 
cubieboard (A20) plus some additional serial connections and an optional 
sensor board for various flight related pressures.

System runs on kernel 5.18.5 generated using Yocto 4.0 kirkstone. The 
system tends to run for a couple of hours and then freezes/crashes. At the 
bottom of this post I have pasted a typical kernel debug output once these 
freezes happen. The crash always happens in the cpu_freq driver. If I set 
cpu frequency to a fixed frequency (setting min=max frequency) those 
crashed disappear. This seems to be a work around at the cost of fixing cpu 
speed.

So it _seems_ the crash is caused by cpu_freq trying to change the cpu 
frequency (at least at some point in time). 

To be honest, I am rather clueless on how to go about finding the root of 
this issue, let along fixing it. So I thought, I'd ask around here whether 
this bug somehow looks familiar and may have been tackled (or even fixed) 
previously (didn't find anything, though, via the search function). In 
other words: I am thankful for any hint people may be able to give me to 
get nearer to a fix. 

thanks for any pointers
Torsten

[26996.004010] Unable to handle kernel paging request at virtual address 
08d80050
[26996.011337] [08d80050] *pgd=
[26996.014952] Internal error: Oops: 5 [#1 
] SMP ARM
[26996.019590] Modules linked in:
[26996.022663] CPU: 1 PID: 95 Comm: sugov:0 Not tainted 5.18.5 #1 

[26996.028509] Hardware name: Allwinner sun7i (A20) Family
[26996.033738] PC is at ccu_div_recalc_rate+0x48/0x90
[26996.038555] LR is at ccu_mux_helper_apply_prediv+0x18/0x1c
[26996.044054] pc : [] lr : [] psr: 600b0113
[26996.050326] sp : f09e5dc8 ip :  fp : c1938200
[26996.04] r10: c1867440 r9 : 1f78a400 r8 : c1302d00
[26996.060781] r7 : 1312d000 r6 : 1f78a400 r5 : 0002 r4 : 08d80084
[26996.067311] r3 :  r2 :  r1 : 0001 r0 : 1f78a400
[26996.073843] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[26996.080985] Control: 10c5387d Table: 41ff006a DAC: 0051
[26996.086733] Register r0 information: non-paged memory
[26996.091799] Register r1 information: non-paged memory
[26996.096858] Register r2 information: non-paged memory
[26996.101915] Register r3 information: NULL pointer
[26996.106627] Register r4 information: non-paged memory
[26996.111688] Register r5 information: non-paged memory
[26996.116746] Register r6 information: non-paged memory
[26996.121805] Register r7 information: non-paged memory
[26996.126863] Register r8 information: slab kmalloc-128 start c1302d00 
pointer offset 0 size 128
[26996.135514] Register r9 information: non-paged memory
[26996.140574] Register r10 information: slab task_struct start c1867440 
pointer offset 0
[26996.148517] Register r11 information: slab kmalloc-128 start c1938200 
pointer offset 0 size 128
[26996.157244] Register r12 information: NULL pointer
[26996.162049] Process sugov:0 (pid: 95, stack limit = 0xf4bf205c)
[26996.167985] Stack: (0xf09e5dc8 to 0xf09e6000)
[26996.172361] 5dc0: c0d81584 c03db530  1f78a400 c1355700 c03d181c
[26996.180547] 5de0: c1355600 c1355700 1f78a400 c03d34ec  c1355600 
1f78a400 39387000
[26996.188733] 5e00: c1302d00 1f78a400 c1867440 c03d3554  c1302d00 
016e3600 39387000
[26996.196917] 5e20: c1302d00 1f78a400 c1867440 c03d3554 c1355600  
1f78a400 c1867440
[26996.205101] 5e40: c1302d00 1f78a400 c1867440 c03d39f0 1f78a400  
 1f78a400
[26996.213287] 5e60: c0d81bd0 df7bf617 c193a340 1f78a400 1f78a400 c1938300 
ef7dc050 1f78a400
[26996.221474] 5e80: c1867440 c03d3c28 c18b3b00 c1938500 1f78a400 c1938300 
ef7dc050 c06122a4
[26996.229659] 5ea0: c1938300 0001  df7bf617 c0d81bd0 c18b3b00 
ef7dc050 1f78a400
[26996.237844] 5ec0: 0007 c1867440 c1938500 c0db652c 00080e80 c0612674 
 c0db617c
[26996.246030] 5ee0: 1f78a400 df7bf617 c1812800 c1812800  c0dfd944 
000ea600 
[26996.254214] 5f00: 0002 c0617054 0001 c1867440   
f09e5f5c c1812800
[26996.262400] 5f20: 000ea600 00080e80 0024 df7bf617 0004 c184ba00 
c184ba14 
[26996.270585] 5f40: 00080e80 c184ba2c 0001 c0a34650  c0159c98 
 c184ba28
[26996.278770] 5f60: c1867440 c0dea144 c184ba2c c0136954 c193a500 c1867440 
c01368e0 c184ba28
[26996.286955] 5f80: c13c2100 f0891c44  c0138194 c193a500 c01380c4 
 
[26996.295138] 5fa0:    c0100148   
 
[26996.303321] 5fc0:       
 
[26996.311505] 5fe0:     0013  
 
[26996.319695] ccu_div_recalc_rate from clk_recalc+0x34/0x78