Re: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]
On Fri, Mar 12, 2021 at 5:27 PM Dennis Clarke wrote: > > > I have seen this for a few months now. The old old netra machine will > run just fine endlessly but if I attempt to perform a package update > then I am always assured to see : > > > ceres# apt-get update > Get:1 http://deb.debian.org/debian-ports sid InRelease [55.3 kB] > Get:2 http://deb.debian.org/debian-ports sid/main sparc64 Packages [21.6 MB] > Get:3 http://deb.debian.org/debian-ports sid/main all Packages [8,682 > kB] > Fetched 30.3 MB in 1min 24s (361 kB/s) > > Reading package lists... Done > ceres# > > Then try "upgrade" and the machine drops off the network : > > Setting up systemd (247.3-1) ... > Timeout, server 172.16.35.61 not responding. Dennis, did you tried to test machine with stress-ng ? There's a lot of tests in it, it could trigger your issue and probably would be easier to hunt down the issue.
Re: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]
On 3/14/21 5:52 PM, John Paul Adrian Glaubitz wrote: > On 3/14/21 6:48 PM, Frank Scheiner wrote: >>> So, if, for example, you want to verify that the memory is okay, you should >>> run >>> a memtest program. >> >> ...the built-in (memory) diagnostics of Sun machines are pretty >> thorough. This is not a PC. :-) > > I doubt that the hardware runs a thorough memory test by default that > can be compared to a full memtest86 test run. > The probability that there is a memory hardware fault after the ECC memory tests done during POST would be very very low. So close to zero that I can not even begin to guess how a memory fault would slip past those ECC diagnostics. Those run for quite a while and I have never seen evidence that there was a problem. See : https://lists.debian.org/debian-sparc/2021/03/msg00026.html Regardless we are just going in circles. I don't know if this is a kernel problem or what. I only know that something goes terribly wrong and it may be a systemd related problem. I think Frank Scheiner made some suggestions and I will go and give a try at isolating the issue. > Either way, if the kernel breaks for someone, they will have to bisect the > issue. I don't have any means in bisecting a problem if I cannot reproduce > it in the first place. > I agree completely. Dennis
Re: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]
On 3/14/21 6:48 PM, Frank Scheiner wrote: >> So, if, for example, you want to verify that the memory is okay, you should >> run >> a memtest program. > > ...the built-in (memory) diagnostics of Sun machines are pretty > thorough. This is not a PC. :-) I doubt that the hardware runs a thorough memory test by default that can be compared to a full memtest86 test run. Either way, if the kernel breaks for someone, they will have to bisect the issue. I don't have any means in bisecting a problem if I cannot reproduce it in the first place. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaub...@debian.org `. `' Freie Universitaet Berlin - glaub...@physik.fu-berlin.de `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Re: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]
On 14.03.21 18:21, John Paul Adrian Glaubitz wrote: On 3/14/21 5:55 PM, Mike Tremaine wrote: Let’s assume it’s not hardware, Dennis has posted the tests and states the machine ran Sol10 fine. The fact that Solaris runs fine can be an indicator the hardware is okay, but it's not a proper verification that it's actually the case. For example, if one of the memory modules is bad, it could happen that the error shows on Linux but not on Solaris because both allocate different memory regions right after the machine has started. Agreed, but... So, if, for example, you want to verify that the memory is okay, you should run a memtest program. ...the built-in (memory) diagnostics of Sun machines are pretty thorough. This is not a PC. :-) Cheers, Frank
Re: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]
On 3/14/21 5:55 PM, Mike Tremaine wrote: > Let’s assume it’s not hardware, Dennis has posted the tests and states > the machine ran Sol10 fine. The fact that Solaris runs fine can be an indicator the hardware is okay, but it's not a proper verification that it's actually the case. For example, if one of the memory modules is bad, it could happen that the error shows on Linux but not on Solaris because both allocate different memory regions right after the machine has started. So, if, for example, you want to verify that the memory is okay, you should run a memtest program. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaub...@debian.org `. `' Freie Universitaet Berlin - glaub...@physik.fu-berlin.de `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Re: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]
Let’s assume it’s not hardware, Dennis has posted the tests and states the machine ran Sol10 fine. My only ideas are 1) Try using apt to update some individual packages to see if that even works. Try dash and bash and whatever but avoid Systemd and any related libraries. 2a) If those succeed trying update systemd and see if causes the crash. or 2b) Trying re-exec systemd I think “kill 1” does that these days. If you can isolate that it is systemd related the question why that, is it something in the Dbus or some other subsystem.??? In a month or so I’ll be finally going to storage and I’d happy to grab my Netra t105 and play along at that point, it would interesting to know if this issue is specific to the Netra series. -Mike > On Mar 13, 2021, at 12:58 PM, Frank Scheiner wrote: > > Hi Dennis, > > On 13.03.21 20:21, Dennis Clarke wrote: >> On 3/13/21 5:29 PM, Mike Tremaine wrote: On Mar 12, 2021, at 5:56 AM, Dennis Clarke wrote: >> [...] >> I did sent a BRK to the serial port and that drops us into the firmware >> "ok" prompt. There is a failed fan but in fact the fan is entirely not >> there. At all. I removed it because it had failed five or six years ago >> and getting another one is just annoying. Also it is not really needed. > > Is the heatsink on the board cooled by a chassis then? > >> >> We can see that there is 1G of ECC memory and the memory passes all the >> basic tests. >> >> Now I setup a few of the firmware variables and reset the unit : >> >> ok printenv >> Variable Name Value Default Value >> >> [...] >> local-mac-address?false false >> [...] > > >> ceres# ip link show >> 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode >> DEFAULT group default qlen 1000 >> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >> 2: enp1s1f1: mtu 1500 qdisc pfifo_fast >> state UNKNOWN mode DEFAULT group default qlen 1000 >> link/ether 08:00:20:c2:46:48 brd ff:ff:ff:ff:ff:ff >> 3: enp1s3f1: mtu 1500 qdisc noop state DOWN mode >> DEFAULT group default qlen 1000 >> link/ether 08:00:20:c2:46:48 brd ff:ff:ff:ff:ff:ff >> ceres# >> >> However there must be a bug somewhere because the physical MAC address >> is the same on both interfaces. > > This is due to `local-mac-address?` set to `false` in OBP. See e.g. [1] > for details. > > [1]: https://docs.oracle.com/cd/E36784_01/html/E37475/eyprp.html > > Cheers, > Frank
Re: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]
Hi Dennis, On 13.03.21 20:21, Dennis Clarke wrote: On 3/13/21 5:29 PM, Mike Tremaine wrote: On Mar 12, 2021, at 5:56 AM, Dennis Clarke wrote: [...] I did sent a BRK to the serial port and that drops us into the firmware "ok" prompt. There is a failed fan but in fact the fan is entirely not there. At all. I removed it because it had failed five or six years ago and getting another one is just annoying. Also it is not really needed. Is the heatsink on the board cooled by a chassis then? We can see that there is 1G of ECC memory and the memory passes all the basic tests. Now I setup a few of the firmware variables and reset the unit : ok printenv Variable Name Value Default Value [...] local-mac-address?false false [...] > ceres# ip link show 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp1s1f1: mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 08:00:20:c2:46:48 brd ff:ff:ff:ff:ff:ff 3: enp1s3f1: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 08:00:20:c2:46:48 brd ff:ff:ff:ff:ff:ff ceres# However there must be a bug somewhere because the physical MAC address is the same on both interfaces. This is due to `local-mac-address?` set to `false` in OBP. See e.g. [1] for details. [1]: https://docs.oracle.com/cd/E36784_01/html/E37475/eyprp.html Cheers, Frank
Re: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]
On 3/13/21 5:29 PM, Mike Tremaine wrote: >> On Mar 12, 2021, at 5:56 AM, >> Dennis Clarke wrote: >> >> I have seen this for a few months now. The old old netra machine will >> run just fine endlessly but if I attempt to perform a package update >> then I am always assured to see : > What kernel are you on? Let me address that *after* we look at the hardware diagnostics. The old Netra t1 105 is pretty much indestructible with the exception being that the internal battery will die. Which mine has. However this affects nothing as the machine can be left plugged in and powered on for years and the firmware variables are trivial to setup if needed. I did do a power down and then left it cold for a day or two. That is a good way to see the full hardware diagnostics when the power plug is put back in. Thus : LOMlite console Standby lom> LOM event: LOM reset lom>poweron lom> LOM event: power on ps/2 kbd check: ...00fe LOM event: Fan 1 failed LOM event: Fault LED 3Hz Checking Sun KB Done %o0 = ..0055.4001 Executing Power On SelfTest SPARCengine(tm)Ultra CP 1500 POST 1.17 ME created 03/06/00 WARRNING: NVRAM battery is either bad or just replaced! Time Stamp [hour:min:sec] 33:30:02 Init POST BSS Init System BSS Probing system keyboard : Done DMMU TLB Tags DMMU TLB Tag Access Test DMMU TLB RAM DMMU TLB RAM Access Test Ecache Tests Probe Ecache ecache_size = 0x0020 Ecache RAM Addr Test Ecache Tag Addr Test Ecache RAM Test Ecache Tag Test Invalidate Ecache Tags All CPU Basic Tests V9 Instruction Test CPU Tick and Tick Compare Reg Test CPU Soft Trap Test CPU Softint Reg and Int Test All Basic MMU Tests DMMU Primary Context Reg Test DMMU Secondary Context Reg Test DMMU TSB Reg Test DMMU Tag Access Reg Test DMMU VA Watchpoint Reg Test DMMU PA Watchpoint Reg Test IMMU TSB Reg Test IMMU Tag Access Reg Test IMMU TLB RAM Access Test IMMU TLB Tag Access Test All Basic Cache Tests Dcache RAM Test Dcache Tag Test Icache RAM Test Icache Tag Test Icache Next Test Icache Predecode Test UltraSPARC IIi MCU Control & Status Regs Init and Tests Init UltraSPARC IIi MCU Control & Status Regs CPU speed : 440 Mhz, mc1 set : 0x544cb9dd Memory Probe and Init Probe Memory INFO: All the memory Group in 10 bit column mode Group 0: 256MB Group 1: 256MB Group 2: 256MB Group 3: 256MB Malloc Post Memory Init Post Memory .. Memory Addr w/ Ecache Map PROM/STACK/NVRAM in DMMU Load Post In Memory Run POST from MEM .. loaded POST in memory Update Master Stack/Frame Pointers All FPU Basic Tests FPU Regs Test FPU State Reg Test FPU Functional Test FPU Trap Test Memory Tests Init Memory ... Memory Addr w/ Ecache Test ECC Memory Addr Test Block Memory Addr Test Block Memory Test ... ... ECC Blk Memory Test ... ... All Basic UltraSPARC IIi PBM Tests Init UltraSPARC IIi PBM PIO Decoder and BCT Test PCI Byte Enable Test UltraSPARC IIi IOMMU Regs Test UltraSPARC IIi IOMMU RAM NTA Test UltraSPARC IIi IOMMU CAM NTA Test UltraSPARC IIi IOMMU RAM Address Test UltraSPARC IIi IOMMU CAM Address Test IOMMU TLB Compare Test IOMMU TLB Flush Test PBM Control/Status Reg Test PBM Diag Reg Test UltraSPARC IIi PBM Regs Test All Advanced CPU Tests DMMU Hit/Miss Test DMMU Little Endian Test IU ASI Access Test FPU ASI Access Test Ecache Thrash Test All CPU Error Reporting Tests CPU Addr Align Trap Test DMMU Access Priv Page Test DMMU Write Protected Page Test All Advanced UltraSPARC IIi PBM Tests Init UltraSPARC IIi PBM Consist DMA Wr, IOMMU hit Ebus Test All Basic Cheerio Tests Cheerio Ebus PCI Config Space Test Cheerio Ethernet PCI Config Space Test Cheerio Ebus Engine Reg T
Re: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]
t; I have unstable the mix but as point of reference…. > > mgt@xray:~$ uname -a > Linux xray 5.10.0-3-sparc64 #1 Debian 5.10.13-1 (2021-02-06) sparc64 GNU/Linux > mgt@xray:~$ cat /etc/debian_version > bullseye/sid > mgt@xray:~$ cat /proc/cpuinfo > cpu : TI UltraSparc IIi (Sabre) > fpu : UltraSparc IIi integrated FPU > pmu : ultra12 > prom : OBP 3.31.0 2001/07/25 20:36 > type : sun4u > ncpus probed : 1 > ncpus active : 1 > D$ parity tl1 : 0 > I$ parity tl1 : 0 > Cpu0ClkTck: 13d92d40 > cpucaps : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis > MMU Type : Spitfire > MMU PGSZs : 8K,64K,512K,4MB > > root@xray:/home/users/mgt# apt update > Get:1 http://deb.debian.org/debian-ports <http://deb.debian.org/debian-ports> > sid InRelease [55.3 kB] > Get:2 http://deb.debian.org/debian-ports <http://deb.debian.org/debian-ports> > unreleased InRelease [56.6 kB] > Get:3 http://deb.debian.org/debian-ports <http://deb.debian.org/debian-ports> > sid/main all Packages [9,069 kB] > > Get:4 http://deb.debian.org/debian-ports <http://deb.debian.org/debian-ports> > sid/main sparc64 Packages [21.5 MB] > > Fetched 30.7 MB in 1min 55s (266 kB/s) > > Reading package lists... Done > Building dependency tree... Done > Reading state information... Done > 111 packages can be upgraded. Run 'apt list --upgradable' to see them. > root@xray:/home/users/mgt# apt list --upgradeable > Listing… Done > . > . > > apt upgrade was then run and 111 packages upgraded without issue…. > >> Setting up systemd (247.3-1) ... >> Timeout, server 172.16.35.61 not responding. >> >> On the serial console we see : >> >> ceres# [2968669.114937] systemd[1]: systemd 247.3-1 running in system >> mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP >> +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD -SECCOMP +BLKID >> +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified) >> [2968669.411163] systemd[1]: Detected architecture sparc64. >> [2968696.703129] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! >> [systemd:1] >> [2968696.794780] Modules linked in: drm(E) >> drm_panel_orientation_quirks(E) i2c_core(E) sg(E) envctrl(E) >> display7seg(E) flash(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) >> autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) >> sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E) >> crct10dif_common(E) ata_generic(E) pata_cmd64x(E) libata(E) sym53c8xx(E) >> scsi_transport_spi(E) scsi_mod(E) sunhme(E) >> [2968697.265208] CPU: 0 PID: 1 Comm: systemd Tainted: GE >> 5.10.0-1-sparc64 #1 Debian 5.10.5-1 >> [2968697.391074] TSTATE: 11001604 TPC: 0094c4f0 TNPC: >> 0094c4f4 Y: Tainted: GE >> [2968697.541033] TPC: >> [2968697.593712] g0: f800065a1c80 g1: 0098 g2: >> g3: 0002 >> [2968697.710488] g4: f80004197020 g5: 00e93214 g6: >> f80004198000 g7: 0058 >> [2968697.827256] o0: 00f24960 o1: f800049ab110 o2: >> 0004 o3: >> [2968697.944022] o4: o5: sp: >> f8000419af81 ret_pc: 0094c4c0 >> [2968698.065369] RPC: >> [2968698.118074] l0: 00f24800 l1: f800041ce021 l2: >> 0003e775fef2 l3: 0003e775fef2 >> [2968698.234848] l4: 0002 l5: f8000419b8f0 l6: >> 00e12000 l7: 0001 >> [2968698.351615] i0: f8000b791048 i1: f800049ab100 i2: >> 00f24800 i3: 00f24978 >> [2968698.468381] i4: 00eb i5: 10040818 i6: >> f8000419b031 i7: 00665838 >> [2968698.585168] I7: >> [2968698.638996] Call Trace: >> [2968698.673323] [<00665838>] chrdev_open+0x98/0x1e0 >> [2968698.744355] [<0065ae30>] do_dentry_open+0x170/0x420 >> [2968698.819928] [<0065ca68>] vfs_open+0x28/0x40 >> [2968698.886379] [<00671348>] path_openat+0x988/0x1100 >> [2968698.959682] [<00673dd0>] do_filp_open+0x50/0x100 >> [2968699.031837] [<0065cd30>] do_sys_openat2+0x70/0x180 >> [2968699.106284] [<0065d268>] sys_openat+0x48/0xc0 >> [2968699.175027] [<00406174>] linux_sparc_syscall+0x34/0x44 >> ~ >> Type 'go' to resume >> ok ~ >> [EOT] >> >> This is pretty consistent behavior. If someone has any ideas that would >> be great. I realize that the old old Netra X1 or Netra T1 is well past >> its prime but it does run very stable. I would love to fire up a big >> Oracle M4000 unit to try but I have not heard from anyone anywhere that >> knows if that can work at all. So for now these old netra units are all >> that I can test with. >> >> >> -- >> Dennis Clarke >> RISC-V/SPARC/PPC/ARM/CISC >> UNIX and Linux spoken >> GreyBeard and suspenders optional > > The Netra’s have few different devices wonder if there is a bug in one of > those drivers? > > -Mike
Re: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]
> On Mar 12, 2021, at 5:56 AM, Dennis Clarke wrote: > > > I have seen this for a few months now. The old old netra machine will > run just fine endlessly but if I attempt to perform a package update > then I am always assured to see : > > What kernel are you on? I do not have a Netra handy (but I have one in storage, like everyone ;p ). I have an Ultra 5 here so UltraSparc IIi CPU. It does not expect this behavior. Any chance the memory module need to be reseated? > ceres# apt-get update > Get:1 http://deb.debian.org/debian-ports sid InRelease [55.3 kB] > Get:2 http://deb.debian.org/debian-ports sid/main sparc64 Packages [21.6 MB] > Get:3 http://deb.debian.org/debian-ports sid/main all Packages [8,682 > kB] > Fetched 30.3 MB in 1min 24s (361 kB/s) > > Reading package lists... Done > ceres# > > Then try "upgrade" and the machine drops off the network : > I have unstable the mix but as point of reference…. mgt@xray:~$ uname -a Linux xray 5.10.0-3-sparc64 #1 Debian 5.10.13-1 (2021-02-06) sparc64 GNU/Linux mgt@xray:~$ cat /etc/debian_version bullseye/sid mgt@xray:~$ cat /proc/cpuinfo cpu : TI UltraSparc IIi (Sabre) fpu : UltraSparc IIi integrated FPU pmu : ultra12 prom: OBP 3.31.0 2001/07/25 20:36 type: sun4u ncpus probed: 1 ncpus active: 1 D$ parity tl1 : 0 I$ parity tl1 : 0 Cpu0ClkTck : 13d92d40 cpucaps : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis MMU Type: Spitfire MMU PGSZs : 8K,64K,512K,4MB root@xray:/home/users/mgt# apt update Get:1 http://deb.debian.org/debian-ports sid InRelease [55.3 kB] Get:2 http://deb.debian.org/debian-ports unreleased InRelease [56.6 kB] Get:3 http://deb.debian.org/debian-ports sid/main all Packages [9,069 kB] Get:4 http://deb.debian.org/debian-ports sid/main sparc64 Packages [21.5 MB] Fetched 30.7 MB in 1min 55s (266 kB/s) Reading package lists... Done Building dependency tree... Done Reading state information... Done 111 packages can be upgraded. Run 'apt list --upgradable' to see them. root@xray:/home/users/mgt# apt list --upgradeable Listing… Done . . apt upgrade was then run and 111 packages upgraded without issue…. > Setting up systemd (247.3-1) ... > Timeout, server 172.16.35.61 not responding. > > On the serial console we see : > > ceres# [2968669.114937] systemd[1]: systemd 247.3-1 running in system > mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP > +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD -SECCOMP +BLKID > +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified) > [2968669.411163] systemd[1]: Detected architecture sparc64. > [2968696.703129] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! > [systemd:1] > [2968696.794780] Modules linked in: drm(E) > drm_panel_orientation_quirks(E) i2c_core(E) sg(E) envctrl(E) > display7seg(E) flash(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) > autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) > sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E) > crct10dif_common(E) ata_generic(E) pata_cmd64x(E) libata(E) sym53c8xx(E) > scsi_transport_spi(E) scsi_mod(E) sunhme(E) > [2968697.265208] CPU: 0 PID: 1 Comm: systemd Tainted: GE > 5.10.0-1-sparc64 #1 Debian 5.10.5-1 > [2968697.391074] TSTATE: 11001604 TPC: 0094c4f0 TNPC: > 0094c4f4 Y: Tainted: GE > [2968697.541033] TPC: > [2968697.593712] g0: f800065a1c80 g1: 0098 g2: > g3: 0002 > [2968697.710488] g4: f80004197020 g5: 00e93214 g6: > f80004198000 g7: 0058 > [2968697.827256] o0: 00f24960 o1: f800049ab110 o2: > 0004 o3: > [2968697.944022] o4: o5: sp: > f8000419af81 ret_pc: 0094c4c0 > [2968698.065369] RPC: > [2968698.118074] l0: 00f24800 l1: f800041ce021 l2: > 0003e775fef2 l3: 0003e775fef2 > [2968698.234848] l4: 0002 l5: f8000419b8f0 l6: > 00e12000 l7: 0001 > [2968698.351615] i0: f8000b791048 i1: f800049ab100 i2: > 00f24800 i3: 00f24978 > [2968698.468381] i4: 00eb i5: 10040818 i6: > f8000419b031 i7: 00665838 > [2968698.585168] I7: > [2968698.638996] Call Trace: > [2968698.673323] [<00665838>] chrdev_open+0x98/0x1e0 > [2968698.744355] [<0065ae30>] do_dentry_open+0x170/0x42
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]
I have seen this for a few months now. The old old netra machine will run just fine endlessly but if I attempt to perform a package update then I am always assured to see : ceres# apt-get update Get:1 http://deb.debian.org/debian-ports sid InRelease [55.3 kB] Get:2 http://deb.debian.org/debian-ports sid/main sparc64 Packages [21.6 MB] Get:3 http://deb.debian.org/debian-ports sid/main all Packages [8,682 kB] Fetched 30.3 MB in 1min 24s (361 kB/s) Reading package lists... Done ceres# Then try "upgrade" and the machine drops off the network : Setting up systemd (247.3-1) ... Timeout, server 172.16.35.61 not responding. On the serial console we see : ceres# [2968669.114937] systemd[1]: systemd 247.3-1 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD -SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified) [2968669.411163] systemd[1]: Detected architecture sparc64. [2968696.703129] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1] [2968696.794780] Modules linked in: drm(E) drm_panel_orientation_quirks(E) i2c_core(E) sg(E) envctrl(E) display7seg(E) flash(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E) crct10dif_common(E) ata_generic(E) pata_cmd64x(E) libata(E) sym53c8xx(E) scsi_transport_spi(E) scsi_mod(E) sunhme(E) [2968697.265208] CPU: 0 PID: 1 Comm: systemd Tainted: GE 5.10.0-1-sparc64 #1 Debian 5.10.5-1 [2968697.391074] TSTATE: 11001604 TPC: 0094c4f0 TNPC: 0094c4f4 Y: Tainted: GE [2968697.541033] TPC: [2968697.593712] g0: f800065a1c80 g1: 0098 g2: g3: 0002 [2968697.710488] g4: f80004197020 g5: 00e93214 g6: f80004198000 g7: 0058 [2968697.827256] o0: 00f24960 o1: f800049ab110 o2: 0004 o3: [2968697.944022] o4: o5: sp: f8000419af81 ret_pc: 0094c4c0 [2968698.065369] RPC: [2968698.118074] l0: 00f24800 l1: f800041ce021 l2: 0003e775fef2 l3: 0003e775fef2 [2968698.234848] l4: 0002 l5: f8000419b8f0 l6: 00e12000 l7: 0001 [2968698.351615] i0: f8000b791048 i1: f800049ab100 i2: 00f24800 i3: 00f24978 [2968698.468381] i4: 00eb i5: 10040818 i6: f8000419b031 i7: 00665838 [2968698.585168] I7: [2968698.638996] Call Trace: [2968698.673323] [<00665838>] chrdev_open+0x98/0x1e0 [2968698.744355] [<0065ae30>] do_dentry_open+0x170/0x420 [2968698.819928] [<0065ca68>] vfs_open+0x28/0x40 [2968698.886379] [<00671348>] path_openat+0x988/0x1100 [2968698.959682] [<00673dd0>] do_filp_open+0x50/0x100 [2968699.031837] [<0065cd30>] do_sys_openat2+0x70/0x180 [2968699.106284] [<0065d268>] sys_openat+0x48/0xc0 [2968699.175027] [<00406174>] linux_sparc_syscall+0x34/0x44 ~ Type 'go' to resume ok ~ [EOT] This is pretty consistent behavior. If someone has any ideas that would be great. I realize that the old old Netra X1 or Netra T1 is well past its prime but it does run very stable. I would love to fire up a big Oracle M4000 unit to try but I have not heard from anyone anywhere that knows if that can work at all. So for now these old netra units are all that I can test with. -- Dennis Clarke RISC-V/SPARC/PPC/ARM/CISC UNIX and Linux spoken GreyBeard and suspenders optional