Re: Sun Ultra 45: Kernel Panic (corrupted stack end detected inside scheduler) with 5.19

2023-08-23 Thread Jarl Gullberg
I'm also experiencing this problem with the 
debian-12.0.0-sparc64-NETINST-1.iso from 2023-05-16 on a SPARC T4-2 
(sun4v). Logs are as below; a Debian 11 image from last year works fine 
(unfortunately with an unknown burn date).


> Loading ...
>
> [    2.259595] niu 0001:0a:00.0: can't ioremap BAR 0: [mem size 
0x0100 64bit]

> [    2.273835] niu 0001:0a:00.0: Cannot map device registers, aborting
> [    2.288904] Kernel panic - not syncing: corrupted stack end 
detected inside scheduler
> [    2.304269] CPU: 0 PID: 92 Comm: (udev-worker) Not tainted 
6.1.0-9-sparc64 #1  Debian 6.1.27-1

> [    2.321462] Call Trace:
> [    2.326321] [<00caaf50>] dump_stack+0x8/0x18
> [    2.336221] [<00ca0dd8>] panic+0xec/0x344
> [    2.345591] [<00cacdc4>] switch_to_pc+0x4ac/0x4c8
> [    2.356363] [<00cad0f4>] __cond_resched+0x34/0x60
> [    2.367134] [<006914a8>] __kmem_cache_alloc_node+0x468/0x520
> [    2.379801] [<00635660>] kmalloc_trace+0x20/0xa0
> [    2.390401] [<100f48e4>] usb_control_msg+0x24/0x120 [usbcore]
> [    2.403257] [<100e7304>] hub_power_on+0x64/0x180 [usbcore]
> [    2.415582] [<100e7e8c>] hub_activate+0x7ac/0x920 [usbcore]
> [    2.428078] [<100ef500>] hub_probe+0xf60/0xfc0 [usbcore]
> [    2.440062] [<100f994c>] usb_probe_interface+0x14c/0x340 
[usbcore]

> [    2.453789] [<00994190>] really_probe+0x290/0x440
> [    2.464542] [<009943cc>] __driver_probe_device+0x8c/0x180
> [    2.476697] [<009944e8>] driver_probe_device+0x28/0xe0
> [    2.488340] [<00994c98>] __device_attach_driver+0x98/0x120
> [    2.500665] [<0099188c>] bus_for_each_drv+0x6c/0xc0
> [    2.511894] Press Stop-A (L1-A) from sun keyboard or send break
> [    2.511894] twice on console to return to the boot prom
> [    2.534001] ---[ end Kernel panic - not syncing: corrupted stack 
end detected inside scheduler ]---





Re: Sun Ultra 45: Kernel Panic (corrupted stack end detected inside scheduler) with 5.19

2022-10-14 Thread j...@pawlicker.com j...@pawlicker.com
(Since I didn't hit reply all on the previous one, whoops I'm not so good at 
mailing lists)

Thanks for the advice. The 5.16 CD is stable enough to pass the installation 
(aside from a video text bug), however in rebooting it blows up. I was able to 
console install it using the same trick as with the U10 (setting the output and 
input devices to TTYA). I'm going to post what I did here so anyone else with 
an UltraSPARC III/T1 based machine can get "something" going. 

I initially tried the 4.19 CD however after grabbing the new gpg key and doing 
an upgrade once the OS was installed it blew up a few minutes in. Namely the 
upgrade failed really early on and I was unable to sudo, and if I logged out I 
was unable to login as well. It didn't even try to let me enter in a password, 
it simply told me that it was incorrect.

So what I did was I used the 5.16 CD, then when I was asked the tasksel 
question I used Ctrl-A and 2 to move to the shell, chrooted /target, and used 
busybox wget on the kernel you linked and installed it. This gave me a working 
system with the 4.19 kernel. For some reason or another though networking did 
not want to work on 4.19, so I mounted the CD, and installed the deb and udebs 
from there (and the kernel image from 
https://snapshot.debian.org/archive/debian-ports/20190622T024525Z/pool-sparc64/main/l/linux/),
 and networking finally worked. I'd install that too or the other udebs for the 
one you linked instead of just using the kernel. 

Now it's rock solid and I can do my weird SPARC experiments. Thanks so much. 
> On 10/13/2022 4:14 AM EDT Frank Scheiner  wrote:
> 
>  
> Hi Jake,
> 
> On 13.10.22 07:13, j...@pawlicker.com j...@pawlicker.com wrote:
> > I've also been able to confirm that this happens with Kernel 5.16 or at
> > least similar bugs do such as Unable to handle kernel NULL pointer
> > dereference, programs such as postgresql break dramatically, and another
> > time SSH panicked the system with a kernel unaligned access. This
> > happened during apt-get:
> > [...]
> Try with kernel 5.9.x, or maybe better already use 4.19.x on UltraSPARC
> IIIi which works OK most of the time AFAIR. You can get those from
> snapshot.debian.org (e.g. [1] or [2]).
> 
> [1]:
> http://snapshot.debian.org/archive/debian-ports/20190719T183113Z/pool-sparc64/main/l/linux/linux-image-4.19.0-5-sparc64_4.19.37-6_sparc64.deb
> 
> [2]:
> http://snapshot.debian.org/archive/debian-ports/20190719T183113Z/pool-sparc64/main/l/linux/linux-image-4.19.0-5-sparc64-smp_4.19.37-6_sparc64.deb
> 
> ...but unsure if your system will run stable enough to successfully
> finish the installation. Alternatively try to reinstall with an older
> ISO and work from there:
> 
> * with 5.9.0-4:
> https://cdimage.debian.org/cdimage/ports/snapshots/2020-12-03/debian-10.0.0-sparc64-NETINST-1.iso
> 
> * with 4.19.0-5:
> https://cdimage.debian.org/cdimage/ports/snapshots/2019-06-26/debian-10.0-sparc64-NETINST-1.iso
> 
> 
> 
> There seems to be a problem with UltraSPARC T1s and I strongly believe
> this or another problem also affects UltraSPARC III(i)s. I have tested a
> variety of processors here:
> 
> https://lists.debian.org/debian-sparc/2021/12/msg4.html
> 
> For more details on this/these issue(s) see:
> 
> https://lists.debian.org/debian-sparc/2021/03/msg00045.html
> 
> ...and:
> 
> https://lists.debian.org/debian-sparc/2022/02/msg0.html
> 
> Cheers,
> Frank



Re: Sun Ultra 45: Kernel Panic (corrupted stack end detected inside scheduler) with 5.19

2022-10-13 Thread Frank Scheiner

Hi Jake,

On 13.10.22 07:13, j...@pawlicker.com j...@pawlicker.com wrote:

I've also been able to confirm that this happens with Kernel 5.16 or at
least similar bugs do such as Unable to handle kernel NULL pointer
dereference, programs such as postgresql break dramatically, and another
time SSH panicked the system with a kernel unaligned access. This
happened during apt-get:
[...]

Try with kernel 5.9.x, or maybe better already use 4.19.x on UltraSPARC
IIIi which works OK most of the time AFAIR. You can get those from
snapshot.debian.org (e.g. [1] or [2]).

[1]:
http://snapshot.debian.org/archive/debian-ports/20190719T183113Z/pool-sparc64/main/l/linux/linux-image-4.19.0-5-sparc64_4.19.37-6_sparc64.deb

[2]:
http://snapshot.debian.org/archive/debian-ports/20190719T183113Z/pool-sparc64/main/l/linux/linux-image-4.19.0-5-sparc64-smp_4.19.37-6_sparc64.deb

...but unsure if your system will run stable enough to successfully
finish the installation. Alternatively try to reinstall with an older
ISO and work from there:

* with 5.9.0-4:
https://cdimage.debian.org/cdimage/ports/snapshots/2020-12-03/debian-10.0.0-sparc64-NETINST-1.iso

* with 4.19.0-5:
https://cdimage.debian.org/cdimage/ports/snapshots/2019-06-26/debian-10.0-sparc64-NETINST-1.iso



There seems to be a problem with UltraSPARC T1s and I strongly believe
this or another problem also affects UltraSPARC III(i)s. I have tested a
variety of processors here:

https://lists.debian.org/debian-sparc/2021/12/msg4.html

For more details on this/these issue(s) see:

https://lists.debian.org/debian-sparc/2021/03/msg00045.html

...and:

https://lists.debian.org/debian-sparc/2022/02/msg0.html

Cheers,
Frank



Re: Sun Ultra 45: Kernel Panic (corrupted stack end detected inside scheduler) with 5.19

2022-10-12 Thread j...@pawlicker.com j...@pawlicker.com
I've also been able to confirm that this happens with Kernel 5.16 or at least 
similar bugs do such as Unable to handle kernel NULL pointer dereference, 
programs such as postgresql break dramatically, and another time SSH panicked 
the system with a kernel unaligned access. This happened during apt-get:

[ 1735.463205] Unable to handle kernel NULL pointer dereference
[ 1735.543500] tsk->{mm,active_mm}->context = 0096
[ 1735.622697] tsk->{mm,active_mm}->pgd = fff207dfc000
[ 1735.697892] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0009
[ 1735.808123] Press Stop-A (L1-A) from sun keyboard or send break
[ 1735.808123] twice on console to return to the boot prom
[ 1735.808158] kernel BUG at kernel/cpu.c:1086!

> On 10/12/2022 11:33 PM EDT j...@pawlicker.com j...@pawlicker.com 
>  wrote:
>  
>  
> On my Sun Ultra 45 with two CPUs, Debian does not boot a finished 
> installation using Kernel 5.19. On the 5.16 kernel included on the CD the OS 
> boots just fine if this is selected using GRUB. This also seems to be 
> intermittent, as first booting into 5.19 was stable after trying to use quiet 
> to get an error log after a 5.16 boot, but then rebooting afterwards gave me 
> a more verbose error:
>  
> Booting `Debian GNU/Linux'
> Loading Linux 5.19.0-2-sparc64-smp ...
> Loading initial ramdisk ...
> [ 0.684502] pci :05:1d.0: unsupported PM cap regs version (4)
> [ 4.937387] BAD IRQ ack 0
> [ 5.436972] Kernel panic - not syncing: corrupted stack end detected inside 
> scheduler
> [ 5.531706] CPU: 0 PID: 107 Comm: systemd-udevd Not tainted 
> 5.19.0-2-sparc64-smp #1 Debian 5.19.11-1
> [ 5.643306] Call Trace:
> [ 5.672826] [<00cbe4e8>] dump_stack+0x8/0x18
> [ 5.732816] [<00cb7518>] panic+0xf0/0x360
> [ 10.067911] ---[ end Kernel panic - not syncing: corrupted stack end 
> detected inside scheduler ]---
>  
> Second boot:
>  
> Loading Linux 5.19.0-2-sparc64-smp ...
> Loading initial ramdisk ...
> [ 0.681139] pci :05:1d.0: unsupported PM cap regs version (4)
> [ 5.014440] Kernel panic - not syncing: corrupted stack end detected inside 
> scheduler
> [ 5.016901] tg3 :07:04.1 eth1: Tigon3 [partno(BCM95715) rev 9001] 
> (PCIX:133MHz:64-bit) MAC address 00:14:4f:0f:db:ed
> [ 5.016925] tg3 :07:04.1 eth1: attached PHY is 5714 (10/100/1000Base-T 
> Ethernet) (WireSpeed[1], EEE[0])
> [ 5.016933] tg3 :07:04.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] 
> TSOcap[1]
> [ 5.016941] tg3 :07:04.1 eth1: dma_rwctrl[76148000] dma_mask[40-bit]
> [ 5.546143] CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 
> 5.19.0-2-sparc64-smp #1 Debian 5.19.11-1
> [ 5.675485] Call Trace:
> [ 5.710701] [<00cbe4e8>] dump_stack+0x8/0x18
> [ 5.782158] [<00cb7518>] panic+0xf0/0x360
> [ 5.850408] [<00cc5698>] switch_to_pc+0x834/0x85c
> [ 5.927055] [<00cc58e0>] __cond_resched+0x40/0x60
> [ 6.003715] [<006bc990>] kmem_cache_alloc_trace+0x430/0x580
> [ 6.090777] [<1001ab9c>] usb_control_msg+0x1c/0x120 [usbcore]
> [ 6.180114] [<1000d480>] hub_power_on+0x60/0x180 [usbcore]
> [ 6.266426] [<1000e0c8>] hub_activate+0x868/0xa00 [usbcore]
> [ 6.354029] [<10015638>] hub_probe+0xeb8/0xf20 [usbcore]
> [ 6.438467] [<1001fda8>] usb_probe_interface+0xe8/0x300 [usbcore]
> [ 6.532373] [<009e8c48>] really_probe+0xc8/0x480
> [ 6.608355] [<009e9124>] __driver_probe_device+0x124/0x180
> [ 6.694905] [<009e91a8>] driver_probe_device+0x28/0xe0
> [ 6.777256] [<009e995c>] __device_attach_driver+0x9c/0x140
> [ 6.863759] [<009e6568>] bus_for_each_drv+0x68/0xc0
> [ 6.942884] [<009e94c0>] __device_attach+0xa0/0x200
> [ 7.022122] Press Stop-A (L1-A) from sun keyboard or send break
> [ 7.022122] twice on console to return to the boot prom
> [ 7.022158] kernel BUG at kernel/cpu.c:1092!
> [ 7.022174] \|/  \|/
> [ 7.022174] "@'/ .. \`@"
> [ 7.022174] /_| \__/ |_\
> [ 7.022174] \__U_/
> [ 7.022178] swapper/1(0): Kernel bad sw trap 5 [#1]
> [ 7.022185] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.19.0-2-sparc64-smp #1 
> Debian 5.19.11-1
> [ 7.022195] TSTATE: 004411e01604 TPC: 00470074 TNPC: 
> 00470078 Y: 000a Not tainted
> [ 7.022201] TPC: 
> [ 7.00] g0: 00ccc140 g1: 00ff2908 g2: 00ff2908 
> g3: 02f6
> [ 7.05] g4: fff200261600 g5: fff37e8c4000 g6: fff2002a 
> g7: 000e
> [ 7.09] o0: 00e2d220 o1: 0444 o2: 4000 
> o3: 0001
> [ 7.022234] o4: 00018701a800 o5: 000e sp: fff2002a3481 
> ret_pc: 0047006c
> [ 7.022238] RPC: 
> [ 7.022244] l0: 1000 l1: 004411001603 l2: 0092979c 
> l3: 0400
> [ 7.022249] l4:  l5:  l6:  
> l7: 0008
> [ 7.022252] i0: 000e i1: fff2002a0008 i2: 4000 
> i3: 

Sun Ultra 45: Kernel Panic (corrupted stack end detected inside scheduler) with 5.19

2022-10-12 Thread j...@pawlicker.com j...@pawlicker.com
On my Sun Ultra 45 with two CPUs, Debian does not boot a finished installation 
using Kernel 5.19. On the 5.16 kernel included on the CD the OS boots just fine 
if this is selected using GRUB. This also seems to be intermittent, as first 
booting into 5.19 was stable after trying to use quiet to get an error log 
after a 5.16 boot, but then rebooting afterwards gave me a more verbose error:
 
Booting `Debian GNU/Linux'
Loading Linux 5.19.0-2-sparc64-smp ...
Loading initial ramdisk ...
[ 0.684502] pci :05:1d.0: unsupported PM cap regs version (4)
[ 4.937387] BAD IRQ ack 0
[ 5.436972] Kernel panic - not syncing: corrupted stack end detected inside 
scheduler
[ 5.531706] CPU: 0 PID: 107 Comm: systemd-udevd Not tainted 
5.19.0-2-sparc64-smp #1 Debian 5.19.11-1
[ 5.643306] Call Trace:
[ 5.672826] [<00cbe4e8>] dump_stack+0x8/0x18
[ 5.732816] [<00cb7518>] panic+0xf0/0x360
[ 10.067911] ---[ end Kernel panic - not syncing: corrupted stack end detected 
inside scheduler ]---
 
Second boot:
 
Loading Linux 5.19.0-2-sparc64-smp ...
Loading initial ramdisk ...
[ 0.681139] pci :05:1d.0: unsupported PM cap regs version (4)
[ 5.014440] Kernel panic - not syncing: corrupted stack end detected inside 
scheduler
[ 5.016901] tg3 :07:04.1 eth1: Tigon3 [partno(BCM95715) rev 9001] 
(PCIX:133MHz:64-bit) MAC address 00:14:4f:0f:db:ed
[ 5.016925] tg3 :07:04.1 eth1: attached PHY is 5714 (10/100/1000Base-T 
Ethernet) (WireSpeed[1], EEE[0])
[ 5.016933] tg3 :07:04.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] 
TSOcap[1]
[ 5.016941] tg3 :07:04.1 eth1: dma_rwctrl[76148000] dma_mask[40-bit]
[ 5.546143] CPU: 0 PID: 117 Comm: systemd-udevd Not tainted 
5.19.0-2-sparc64-smp #1 Debian 5.19.11-1
[ 5.675485] Call Trace:
[ 5.710701] [<00cbe4e8>] dump_stack+0x8/0x18
[ 5.782158] [<00cb7518>] panic+0xf0/0x360
[ 5.850408] [<00cc5698>] switch_to_pc+0x834/0x85c
[ 5.927055] [<00cc58e0>] __cond_resched+0x40/0x60
[ 6.003715] [<006bc990>] kmem_cache_alloc_trace+0x430/0x580
[ 6.090777] [<1001ab9c>] usb_control_msg+0x1c/0x120 [usbcore]
[ 6.180114] [<1000d480>] hub_power_on+0x60/0x180 [usbcore]
[ 6.266426] [<1000e0c8>] hub_activate+0x868/0xa00 [usbcore]
[ 6.354029] [<10015638>] hub_probe+0xeb8/0xf20 [usbcore]
[ 6.438467] [<1001fda8>] usb_probe_interface+0xe8/0x300 [usbcore]
[ 6.532373] [<009e8c48>] really_probe+0xc8/0x480
[ 6.608355] [<009e9124>] __driver_probe_device+0x124/0x180
[ 6.694905] [<009e91a8>] driver_probe_device+0x28/0xe0
[ 6.777256] [<009e995c>] __device_attach_driver+0x9c/0x140
[ 6.863759] [<009e6568>] bus_for_each_drv+0x68/0xc0
[ 6.942884] [<009e94c0>] __device_attach+0xa0/0x200
[ 7.022122] Press Stop-A (L1-A) from sun keyboard or send break
[ 7.022122] twice on console to return to the boot prom
[ 7.022158] kernel BUG at kernel/cpu.c:1092!
[ 7.022174] \|/  \|/
[ 7.022174] "@'/ .. \`@"
[ 7.022174] /_| \__/ |_\
[ 7.022174] \__U_/
[ 7.022178] swapper/1(0): Kernel bad sw trap 5 [#1]
[ 7.022185] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.19.0-2-sparc64-smp #1 
Debian 5.19.11-1
[ 7.022195] TSTATE: 004411e01604 TPC: 00470074 TNPC: 
00470078 Y: 000a Not tainted
[ 7.022201] TPC: 
[ 7.00] g0: 00ccc140 g1: 00ff2908 g2: 00ff2908 g3: 
02f6
[ 7.05] g4: fff200261600 g5: fff37e8c4000 g6: fff2002a g7: 
000e
[ 7.09] o0: 00e2d220 o1: 0444 o2: 4000 o3: 
0001
[ 7.022234] o4: 00018701a800 o5: 000e sp: fff2002a3481 
ret_pc: 0047006c
[ 7.022238] RPC: 
[ 7.022244] l0: 1000 l1: 004411001603 l2: 0092979c l3: 
0400
[ 7.022249] l4:  l5:  l6:  l7: 
0008
[ 7.022252] i0: 000e i1: fff2002a0008 i2: 4000 i3: 
fff37fa00e68
[ 7.022257] i4: fff37e8c4000 i5: 0113ce68 i6: fff2002a3531 i7: 
004be388
[ 7.022261] I7: 
[ 7.022271] Call Trace:
[ 7.022273] [<004be388>] do_idle+0x168/0x1a0
[ 7.022280] [<004be684>] cpu_startup_entry+0x24/0x40
[ 7.022286] [<0044080c>] smp_callin+0xec/0x120
[ 7.022293] [<00f6c3d4>] 0xf6c3d4
[ 7.022300] [<8000>] 0x8000
[ 7.022306] Caller[004be388]: do_idle+0x168/0x1a0
[ 7.022311] Caller[004be684]: cpu_startup_entry+0x24/0x40
[ 7.022316] Caller[0044080c]: smp_callin+0xec/0x120
[ 7.022322] Caller[00f6c3d4]: 0xf6c3d4
[ 7.022326] Caller[8000]: 0x8000
[ 7.022330] Instruction DUMP:
[ 7.022332] 92102444
[ 7.022335] 7ffee2ed
[ 7.022337] 9010
[ 7.022339] <91d02005>
[ 7.022341] 0100
[ 7.022343] 0100
[ 7.022345] 9de3bf50
[ 7.022347] 0100
[ 7.022349] 3b003dbc
[ 7.022351]
[ 10.057417] ---[ end Kernel panic - not syncing: corrupted stack end detected 
inside scheduler