subject:"Re\: \[linux\-sunxi\] Kernel crash in \"cpu

Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-22 Thread Torsten Beyer

Hi Samuel,

tried setting OPP voltages in 50mV increments. That doesn't seem to work. 
Shouldn't /sys/class/regulator/regulator.9/microvolts show the actual 
microvolts supplied to the CPU? When setting the lowest value to 1.15V I 
never see a value lower than 1.2V in the above. Either I am looking to the 
wrong part or the regulator is rounding the value I would say. Any comments?

regards
Torsten

sam...@sholland.org schrieb am Mittwoch, 20. Juli 2022 um 05:24:27 UTC+2:

> Hi Torsten,
>
> On 7/19/22 3:44 AM, Torsten Beyer wrote:
> > fantastic - I have built a kernel with changed setting yesterday 
> afternoon
> > (increased min OPP to 1.1V) and the system has been running for about 
> 20hrs now
> > without freezes. Thanks for your help - looks like this patch fixes it. 
>
> Thanks for the confirmation. The voltage regulator supplying the CPU 
> (reg_dcdc2)
> has a 25 mV step size, so you could see if a smaller change to the OPP is 
> enough
> to make the board stable.
>
> I assume you are powering the board with a reasonably stable power supply? 
> In
> that case, it would be good to apply your change upstream, in case any 
> other
> Cubieboard 2 users are experiencing crashes. If you want to submit a 
> patch, I
> would suggest overriding the OPP table in the board-specific DTS. See 
> here[1]
> for an example of a board that does this.
>
> Regards,
> Samuel
>
> [1]:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/sun7i-a20-bananapi.dts#n105
>

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/linux-sunxi/37d78b1c-92da-42c6-b95a-9a756e8713bbn%40googlegroups.com.

Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-19 Thread Torsten Beyer

Hi Samuel,

thanks for the additional hints. Will try to follow (this is the first time 
I am poking around in a unix kernel since 4.3bsd ... so I'm a bit rusty 
still). The power supply is suitably stable. I will try and experiment with 
lower voltages and see what happens. I may run into questions and would 
appreciate your further support in case I need it.

cheers
Torsten

sam...@sholland.org schrieb am Mittwoch, 20. Juli 2022 um 05:24:27 UTC+2:

> Hi Torsten,
>
> On 7/19/22 3:44 AM, Torsten Beyer wrote:
> > fantastic - I have built a kernel with changed setting yesterday 
> afternoon
> > (increased min OPP to 1.1V) and the system has been running for about 
> 20hrs now
> > without freezes. Thanks for your help - looks like this patch fixes it. 
>
> Thanks for the confirmation. The voltage regulator supplying the CPU 
> (reg_dcdc2)
> has a 25 mV step size, so you could see if a smaller change to the OPP is 
> enough
> to make the board stable.
>
> I assume you are powering the board with a reasonably stable power supply? 
> In
> that case, it would be good to apply your change upstream, in case any 
> other
> Cubieboard 2 users are experiencing crashes. If you want to submit a 
> patch, I
> would suggest overriding the OPP table in the board-specific DTS. See 
> here[1]
> for an example of a board that does this.
>
> Regards,
> Samuel
>
> [1]:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/sun7i-a20-bananapi.dts#n105
>

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/linux-sunxi/5d16187c-ea62-4c49-9668-3a86ee6eee12n%40googlegroups.com.

Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-19 Thread Samuel Holland

Hi Torsten,

On 7/19/22 3:44 AM, Torsten Beyer wrote:
> fantastic - I have built a kernel with changed setting yesterday afternoon
> (increased min OPP to 1.1V) and the system has been running for about 20hrs 
> now
> without freezes. Thanks for your help - looks like this patch fixes it. 

Thanks for the confirmation. The voltage regulator supplying the CPU (reg_dcdc2)
has a 25 mV step size, so you could see if a smaller change to the OPP is enough
to make the board stable.

I assume you are powering the board with a reasonably stable power supply? In
that case, it would be good to apply your change upstream, in case any other
Cubieboard 2 users are experiencing crashes. If you want to submit a patch, I
would suggest overriding the OPP table in the board-specific DTS. See here[1]
for an example of a board that does this.

Regards,
Samuel

[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/sun7i-a20-bananapi.dts#n105

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/linux-sunxi/dc03b197-492a-0979-728b-61d9135cb7d1%40sholland.org.

Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-19 Thread Torsten Beyer

Hi Samuel,

fantastic - I have built a kernel with changed setting yesterday afternoon 
(increased min OPP to 1.1V) and the system has been running for about 20hrs 
now without freezes. Thanks for your help - looks like this patch fixes it. 

cheers
Torsten

sam...@sholland.org schrieb am Dienstag, 19. Juli 2022 um 04:13:48 UTC+2:

> Hi Torsten,
>
> On 7/18/22 7:13 AM, Torsten Beyer wrote:
> > Hi again,
> > 
> > I believe I found the place. Can you confirm that changing the 
> microvolts OPPs
> > in "arch/arm/boot/dts/sun7i-a20.dtsi" is the right place for upping the
> > microvolts for lower frequencies?
>
> Yes, in that file, in the operating-points property of the cpu nodes.
>
> Regards,
> Samuel
>

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/linux-sunxi/9a4a3380-8e54-494c-9bce-daa33a70ac0cn%40googlegroups.com.

Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-18 Thread Samuel Holland

Hi Torsten,

On 7/18/22 7:13 AM, Torsten Beyer wrote:
> Hi again,
> 
> I believe I found the place. Can you confirm that changing the microvolts OPPs
> in "arch/arm/boot/dts/sun7i-a20.dtsi" is the right place for upping the
> microvolts for lower frequencies?

Yes, in that file, in the operating-points property of the cpu nodes.

Regards,
Samuel

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/linux-sunxi/02b404a3-8b1a-1da8-9048-3daba9e8166c%40sholland.org.

Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-18 Thread Torsten Beyer

Hi again,

I believe I found the place. Can you confirm that changing the microvolts 
OPPs in "arch/arm/boot/dts/sun7i-a20.dtsi" is the right place for upping 
the microvolts for lower frequencies?

cheers
-tb

Torsten Beyer schrieb am Montag, 18. Juli 2022 um 09:26:58 UTC+2:

> Hi Samuel,
>
> am stuck trying to figure out how to increase the voltages. Can you point 
> me to some documentation or quickly explain how I would do that?
>
> regards
> Torsten
>
> sam...@sholland.org schrieb am Samstag, 16. Juli 2022 um 06:16:16 UTC+2:
>
>> Hi Torsten, 
>>
>> On 7/13/22 3:18 AM, Torsten Beyer wrote: 
>> > Hi all, 
>> > 
>> > I am trying to debug a bug on an open source air navigation box for 
>> > gliders called openvario . It is 
>> > based on a cubieboard (A20) plus some additional serial connections 
>> > and an optional sensor board for various flight related pressures. 
>> > 
>> > System runs on kernel 5.18.5 generated using Yocto 4.0 kirkstone. The 
>> > system tends to run for a couple of hours and then freezes/crashes. 
>> > At the bottom of this post I have pasted a typical kernel debug 
>> > output once these freezes happen. The crash always happens in the 
>> > cpu_freq driver. If I set cpu frequency to a fixed frequency (setting 
>> > min=max frequency) those crashed disappear. This seems to be a work 
>> > around at the cost of fixing cpu speed. 
>> > 
>> > So it _seems_ the crash is caused by cpu_freq trying to change the 
>> > cpu frequency (at least at some point in time). 
>> > 
>> > To be honest, I am rather clueless on how to go about finding the 
>> > root of this issue, let along fixing it. So I thought, I'd ask around 
>> > here whether this bug somehow looks familiar and may have been 
>> > tackled (or even fixed) previously (didn't find anything, though, via 
>> > the search function). In other words: I am thankful for any hint 
>> > people may be able to give me to get nearer to a fix.  
>>
>> I have not seen something like this before. It looks like hardware 
>> flakiness. Can you provide a disassembly of ccu_div_recalc_rate 
>> from the kernel this splat came from, to confirm my analysis? 
>>
>> > thanks for any pointers 
>> > Torsten 
>> > 
>> > [26996.004010] Unable to handle kernel paging request at virtual 
>> address 08d80050 
>> > [26996.011337] [08d80050] *pgd= 
>> > [26996.014952] Internal error: Oops: 5 [#1] SMP ARM 
>> > [26996.019590] Modules linked in: 
>> > [26996.022663] CPU: 1 PID: 95 Comm: sugov:0 Not tainted 5.18.5 #1 
>> > [26996.028509] Hardware name: Allwinner sun7i (A20) Family 
>> > [26996.033738] PC is at ccu_div_recalc_rate+0x48/0x90 
>> > [26996.038555] LR is at ccu_mux_helper_apply_prediv+0x18/0x1c 
>>
>> The crash is between the calls to ccu_mux_helper_apply_prediv and 
>> divider_recalc_rate, so we are loading arguments for the call to 
>> divider_recalc_rate. 
>>
>> > [26996.044054] pc : [] lr : [] psr: 600b0113 
>> > [26996.050326] sp : f09e5dc8 ip :  fp : c1938200 
>> > [26996.04] r10: c1867440 r9 : 1f78a400 r8 : c1302d00 
>> > [26996.060781] r7 : 1312d000 r6 : 1f78a400 r5 : 0002 r4 : 08d80084 
>>
>> Assuming r4 is "hw", then the faulting address is cd->div.flags. 
>> This is weird because r5 already contains cd->div.width... 
>>
>> > [26996.067311] r3 :  r2 :  r1 : 0001 r0 : 1f78a400 
>>
>> ..and r3 already contains cd->div.table. So we were already able 
>> to access parts of the struct both before and after the faulting 
>> address. 
>>
>> > [26996.073843] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment 
>> none 
>> > [26996.080985] Control: 10c5387d Table: 41ff006a DAC: 0051 
>> > [26996.086733] Register r0 information: non-paged memory 
>> > [26996.091799] Register r1 information: non-paged memory 
>> > [26996.096858] Register r2 information: non-paged memory 
>> > [26996.101915] Register r3 information: NULL pointer 
>> > [26996.106627] Register r4 information: non-paged memory 
>> > [26996.111688] Register r5 information: non-paged memory 
>> > [26996.116746] Register r6 information: non-paged memory 
>> > [26996.121805] Register r7 information: non-paged memory 
>> > [26996.126863] Register r8 information: slab kmalloc-128 start c1302d00 
>> pointer offset 0 size 128 
>> > [26996.135514] Register r9 information: non-paged memory 
>> > [26996.140574] Register r10 information: slab task_struct start 
>> c1867440 pointer offset 0 
>> > [26996.148517] Register r11 information: slab kmalloc-128 start 
>> c1938200 pointer offset 0 size 128 
>> > [26996.157244] Register r12 information: NULL pointer 
>> > [26996.162049] Process sugov:0 (pid: 95, stack limit = 0xf4bf205c) 
>> > [26996.167985] Stack: (0xf09e5dc8 to 0xf09e6000) 
>> > [26996.172361] 5dc0: c0d81584 c03db530  1f78a400 c1355700 
>> c03d181c 
>>
>> What I think is happening is that the value in r4 got corrupted from 
>> 0xc0d81584 (the saved value on the top of the stack) to 0x08d80084. 
>>
>>

Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-18 Thread Torsten Beyer

Hi Samuel,

am stuck trying to figure out how to increase the voltages. Can you point 
me to some documentation or quickly explain how I would do that?

regards
Torsten

sam...@sholland.org schrieb am Samstag, 16. Juli 2022 um 06:16:16 UTC+2:

> Hi Torsten,
>
> On 7/13/22 3:18 AM, Torsten Beyer wrote:
> > Hi all,
> > 
> > I am trying to debug a bug on an open source air navigation box for
> > gliders called openvario . It is
> > based on a cubieboard (A20) plus some additional serial connections
> > and an optional sensor board for various flight related pressures.
> > 
> > System runs on kernel 5.18.5 generated using Yocto 4.0 kirkstone. The
> > system tends to run for a couple of hours and then freezes/crashes.
> > At the bottom of this post I have pasted a typical kernel debug
> > output once these freezes happen. The crash always happens in the
> > cpu_freq driver. If I set cpu frequency to a fixed frequency (setting
> > min=max frequency) those crashed disappear. This seems to be a work
> > around at the cost of fixing cpu speed.
> > 
> > So it _seems_ the crash is caused by cpu_freq trying to change the
> > cpu frequency (at least at some point in time).
> > 
> > To be honest, I am rather clueless on how to go about finding the
> > root of this issue, let along fixing it. So I thought, I'd ask around
> > here whether this bug somehow looks familiar and may have been
> > tackled (or even fixed) previously (didn't find anything, though, via
> > the search function). In other words: I am thankful for any hint
> > people may be able to give me to get nearer to a fix. 
>
> I have not seen something like this before. It looks like hardware
> flakiness. Can you provide a disassembly of ccu_div_recalc_rate
> from the kernel this splat came from, to confirm my analysis?
>
> > thanks for any pointers
> > Torsten
> > 
> > [26996.004010] Unable to handle kernel paging request at virtual address 
> 08d80050
> > [26996.011337] [08d80050] *pgd=
> > [26996.014952] Internal error: Oops: 5 [#1] SMP ARM
> > [26996.019590] Modules linked in:
> > [26996.022663] CPU: 1 PID: 95 Comm: sugov:0 Not tainted 5.18.5 #1
> > [26996.028509] Hardware name: Allwinner sun7i (A20) Family
> > [26996.033738] PC is at ccu_div_recalc_rate+0x48/0x90
> > [26996.038555] LR is at ccu_mux_helper_apply_prediv+0x18/0x1c
>
> The crash is between the calls to ccu_mux_helper_apply_prediv and
> divider_recalc_rate, so we are loading arguments for the call to
> divider_recalc_rate.
>
> > [26996.044054] pc : [] lr : [] psr: 600b0113
> > [26996.050326] sp : f09e5dc8 ip :  fp : c1938200
> > [26996.04] r10: c1867440 r9 : 1f78a400 r8 : c1302d00
> > [26996.060781] r7 : 1312d000 r6 : 1f78a400 r5 : 0002 r4 : 08d80084
>
> Assuming r4 is "hw", then the faulting address is cd->div.flags.
> This is weird because r5 already contains cd->div.width...
>
> > [26996.067311] r3 :  r2 :  r1 : 0001 r0 : 1f78a400
>
> ..and r3 already contains cd->div.table. So we were already able
> to access parts of the struct both before and after the faulting
> address.
>
> > [26996.073843] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment 
> none
> > [26996.080985] Control: 10c5387d Table: 41ff006a DAC: 0051
> > [26996.086733] Register r0 information: non-paged memory
> > [26996.091799] Register r1 information: non-paged memory
> > [26996.096858] Register r2 information: non-paged memory
> > [26996.101915] Register r3 information: NULL pointer
> > [26996.106627] Register r4 information: non-paged memory
> > [26996.111688] Register r5 information: non-paged memory
> > [26996.116746] Register r6 information: non-paged memory
> > [26996.121805] Register r7 information: non-paged memory
> > [26996.126863] Register r8 information: slab kmalloc-128 start c1302d00 
> pointer offset 0 size 128
> > [26996.135514] Register r9 information: non-paged memory
> > [26996.140574] Register r10 information: slab task_struct start c1867440 
> pointer offset 0
> > [26996.148517] Register r11 information: slab kmalloc-128 start c1938200 
> pointer offset 0 size 128
> > [26996.157244] Register r12 information: NULL pointer
> > [26996.162049] Process sugov:0 (pid: 95, stack limit = 0xf4bf205c)
> > [26996.167985] Stack: (0xf09e5dc8 to 0xf09e6000)
> > [26996.172361] 5dc0: c0d81584 c03db530  1f78a400 c1355700 
> c03d181c
>
> What I think is happening is that the value in r4 got corrupted from
> 0xc0d81584 (the saved value on the top of the stack) to 0x08d80084.
>
> Can you try increasing the voltage of the lower OPPs by 100 mV? And
> if that doesn't work, try setting all of the OPPs to 1.4 V. That
> should rule out any instability due to an insufficient CPU supply
> voltage, and also due to any delay in slewing the regulator output.
>
> Regards,
> Samuel
>
> > [26996.180547] 5de0: c1355600 c1355700 1f78a400 c03d34ec  
> c1355600 1f78a400 39387000
> > [26996.188733] 5e00: c1302d00 1f78a400 c1867440

Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-16 Thread Torsten Beyer

Hi Samuel,

thanks for your insights. Will try to follow them and will report back 
here. 

In the meantime I have built a kernel with dynamic debug and I can see that 
cpu_freq and the associated calls shown in my earlier post must run 
millions of times. And then out of the blue a crash...so some hw flakiness 
came to my mind, too.

regards
Torsten

sam...@sholland.org schrieb am Samstag, 16. Juli 2022 um 06:16:16 UTC+2:

> Hi Torsten,
>
> On 7/13/22 3:18 AM, Torsten Beyer wrote:
> > Hi all,
> > 
> > I am trying to debug a bug on an open source air navigation box for
> > gliders called openvario . It is
> > based on a cubieboard (A20) plus some additional serial connections
> > and an optional sensor board for various flight related pressures.
> > 
> > System runs on kernel 5.18.5 generated using Yocto 4.0 kirkstone. The
> > system tends to run for a couple of hours and then freezes/crashes.
> > At the bottom of this post I have pasted a typical kernel debug
> > output once these freezes happen. The crash always happens in the
> > cpu_freq driver. If I set cpu frequency to a fixed frequency (setting
> > min=max frequency) those crashed disappear. This seems to be a work
> > around at the cost of fixing cpu speed.
> > 
> > So it _seems_ the crash is caused by cpu_freq trying to change the
> > cpu frequency (at least at some point in time).
> > 
> > To be honest, I am rather clueless on how to go about finding the
> > root of this issue, let along fixing it. So I thought, I'd ask around
> > here whether this bug somehow looks familiar and may have been
> > tackled (or even fixed) previously (didn't find anything, though, via
> > the search function). In other words: I am thankful for any hint
> > people may be able to give me to get nearer to a fix. 
>
> I have not seen something like this before. It looks like hardware
> flakiness. Can you provide a disassembly of ccu_div_recalc_rate
> from the kernel this splat came from, to confirm my analysis?
>
> > thanks for any pointers
> > Torsten
> > 
> > [26996.004010] Unable to handle kernel paging request at virtual address 
> 08d80050
> > [26996.011337] [08d80050] *pgd=
> > [26996.014952] Internal error: Oops: 5 [#1] SMP ARM
> > [26996.019590] Modules linked in:
> > [26996.022663] CPU: 1 PID: 95 Comm: sugov:0 Not tainted 5.18.5 #1
> > [26996.028509] Hardware name: Allwinner sun7i (A20) Family
> > [26996.033738] PC is at ccu_div_recalc_rate+0x48/0x90
> > [26996.038555] LR is at ccu_mux_helper_apply_prediv+0x18/0x1c
>
> The crash is between the calls to ccu_mux_helper_apply_prediv and
> divider_recalc_rate, so we are loading arguments for the call to
> divider_recalc_rate.
>
> > [26996.044054] pc : [] lr : [] psr: 600b0113
> > [26996.050326] sp : f09e5dc8 ip :  fp : c1938200
> > [26996.04] r10: c1867440 r9 : 1f78a400 r8 : c1302d00
> > [26996.060781] r7 : 1312d000 r6 : 1f78a400 r5 : 0002 r4 : 08d80084
>
> Assuming r4 is "hw", then the faulting address is cd->div.flags.
> This is weird because r5 already contains cd->div.width...
>
> > [26996.067311] r3 :  r2 :  r1 : 0001 r0 : 1f78a400
>
> ..and r3 already contains cd->div.table. So we were already able
> to access parts of the struct both before and after the faulting
> address.
>
> > [26996.073843] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment 
> none
> > [26996.080985] Control: 10c5387d Table: 41ff006a DAC: 0051
> > [26996.086733] Register r0 information: non-paged memory
> > [26996.091799] Register r1 information: non-paged memory
> > [26996.096858] Register r2 information: non-paged memory
> > [26996.101915] Register r3 information: NULL pointer
> > [26996.106627] Register r4 information: non-paged memory
> > [26996.111688] Register r5 information: non-paged memory
> > [26996.116746] Register r6 information: non-paged memory
> > [26996.121805] Register r7 information: non-paged memory
> > [26996.126863] Register r8 information: slab kmalloc-128 start c1302d00 
> pointer offset 0 size 128
> > [26996.135514] Register r9 information: non-paged memory
> > [26996.140574] Register r10 information: slab task_struct start c1867440 
> pointer offset 0
> > [26996.148517] Register r11 information: slab kmalloc-128 start c1938200 
> pointer offset 0 size 128
> > [26996.157244] Register r12 information: NULL pointer
> > [26996.162049] Process sugov:0 (pid: 95, stack limit = 0xf4bf205c)
> > [26996.167985] Stack: (0xf09e5dc8 to 0xf09e6000)
> > [26996.172361] 5dc0: c0d81584 c03db530  1f78a400 c1355700 
> c03d181c
>
> What I think is happening is that the value in r4 got corrupted from
> 0xc0d81584 (the saved value on the top of the stack) to 0x08d80084.
>
> Can you try increasing the voltage of the lower OPPs by 100 mV? And
> if that doesn't work, try setting all of the OPPs to 1.4 V. That
> should rule out any instability due to an insufficient CPU supply
> voltage, and also due to any delay in slewing the regulator

Re: [linux-sunxi] Kernel crash in "cpu_freq"

2022-07-15 Thread Samuel Holland

Hi Torsten,

On 7/13/22 3:18 AM, Torsten Beyer wrote:
> Hi all,
> 
> I am trying to debug a bug on an open source air navigation box for
> gliders called openvario . It is
> based on a cubieboard (A20) plus some additional serial connections
> and an optional sensor board for various flight related pressures.
> 
> System runs on kernel 5.18.5 generated using Yocto 4.0 kirkstone. The
> system tends to run for a couple of hours and then freezes/crashes.
> At the bottom of this post I have pasted a typical kernel debug
> output once these freezes happen. The crash always happens in the
> cpu_freq driver. If I set cpu frequency to a fixed frequency (setting
> min=max frequency) those crashed disappear. This seems to be a work
> around at the cost of fixing cpu speed.
> 
> So it _seems_ the crash is caused by cpu_freq trying to change the
> cpu frequency (at least at some point in time).
> 
> To be honest, I am rather clueless on how to go about finding the
> root of this issue, let along fixing it. So I thought, I'd ask around
> here whether this bug somehow looks familiar and may have been
> tackled (or even fixed) previously (didn't find anything, though, via
> the search function). In other words: I am thankful for any hint
> people may be able to give me to get nearer to a fix. 

I have not seen something like this before. It looks like hardware
flakiness. Can you provide a disassembly of ccu_div_recalc_rate
from the kernel this splat came from, to confirm my analysis?

> thanks for any pointers
> Torsten
> 
> [26996.004010] Unable to handle kernel paging request at virtual address 
> 08d80050
> [26996.011337] [08d80050] *pgd=
> [26996.014952] Internal error: Oops: 5 [#1] SMP ARM
> [26996.019590] Modules linked in:
> [26996.022663] CPU: 1 PID: 95 Comm: sugov:0 Not tainted 5.18.5 #1
> [26996.028509] Hardware name: Allwinner sun7i (A20) Family
> [26996.033738] PC is at ccu_div_recalc_rate+0x48/0x90
> [26996.038555] LR is at ccu_mux_helper_apply_prediv+0x18/0x1c

The crash is between the calls to ccu_mux_helper_apply_prediv and
divider_recalc_rate, so we are loading arguments for the call to
divider_recalc_rate.

> [26996.044054] pc : [] lr : [] psr: 600b0113
> [26996.050326] sp : f09e5dc8 ip :  fp : c1938200
> [26996.04] r10: c1867440 r9 : 1f78a400 r8 : c1302d00
> [26996.060781] r7 : 1312d000 r6 : 1f78a400 r5 : 0002 r4 : 08d80084

Assuming r4 is "hw", then the faulting address is cd->div.flags.
This is weird because r5 already contains cd->div.width...

> [26996.067311] r3 :  r2 :  r1 : 0001 r0 : 1f78a400

..and r3 already contains cd->div.table. So we were already able
to access parts of the struct both before and after the faulting
address.

> [26996.073843] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
> [26996.080985] Control: 10c5387d Table: 41ff006a DAC: 0051
> [26996.086733] Register r0 information: non-paged memory
> [26996.091799] Register r1 information: non-paged memory
> [26996.096858] Register r2 information: non-paged memory
> [26996.101915] Register r3 information: NULL pointer
> [26996.106627] Register r4 information: non-paged memory
> [26996.111688] Register r5 information: non-paged memory
> [26996.116746] Register r6 information: non-paged memory
> [26996.121805] Register r7 information: non-paged memory
> [26996.126863] Register r8 information: slab kmalloc-128 start c1302d00 
> pointer offset 0 size 128
> [26996.135514] Register r9 information: non-paged memory
> [26996.140574] Register r10 information: slab task_struct start c1867440 
> pointer offset 0
> [26996.148517] Register r11 information: slab kmalloc-128 start c1938200 
> pointer offset 0 size 128
> [26996.157244] Register r12 information: NULL pointer
> [26996.162049] Process sugov:0 (pid: 95, stack limit = 0xf4bf205c)
> [26996.167985] Stack: (0xf09e5dc8 to 0xf09e6000)
> [26996.172361] 5dc0:   c0d81584 c03db530  1f78a400 
> c1355700 c03d181c

What I think is happening is that the value in r4 got corrupted from
0xc0d81584 (the saved value on the top of the stack) to 0x08d80084.

Can you try increasing the voltage of the lower OPPs by 100 mV? And
if that doesn't work, try setting all of the OPPs to 1.4 V. That
should rule out any instability due to an insufficient CPU supply
voltage, and also due to any delay in slewing the regulator output.

Regards,
Samuel

> [26996.180547] 5de0: c1355600 c1355700 1f78a400 c03d34ec  c1355600 
> 1f78a400 39387000
> [26996.188733] 5e00: c1302d00 1f78a400 c1867440 c03d3554  c1302d00 
> 016e3600 39387000
> [26996.196917] 5e20: c1302d00 1f78a400 c1867440 c03d3554 c1355600  
> 1f78a400 c1867440
> [26996.205101] 5e40: c1302d00 1f78a400 c1867440 c03d39f0 1f78a400  
>  1f78a400
> [26996.213287] 5e60: c0d81bd0 df7bf617 c193a340 1f78a400 1f78a400 c1938300 
> ef7dc050 1f78a400
> [26996.221474] 5e80: c1867440 c03d3c28 c18b3b00 c1938500 1f78a400

Re: [linux-sunxi] Kernel crash in "cpu_freq"

Re: [linux-sunxi] Kernel crash in "cpu_freq"

Re: [linux-sunxi] Kernel crash in "cpu_freq"

Re: [linux-sunxi] Kernel crash in "cpu_freq"

Re: [linux-sunxi] Kernel crash in "cpu_freq"

Re: [linux-sunxi] Kernel crash in "cpu_freq"

Re: [linux-sunxi] Kernel crash in "cpu_freq"

Re: [linux-sunxi] Kernel crash in "cpu_freq"

Re: [linux-sunxi] Kernel crash in "cpu_freq"

9 matches

Site Navigation

Mail list logo

Footer information