Re: misc panics
Le lundi 28 décembre 2020 à 12:34 +0200, Gregory Edigarov a écrit : > On 12/28/20 12:18 PM, rgc wrote: > > On Mon, Dec 28, 2020 at 10:39:56AM +0100, Otto Moerbeek wrote: > > > On Mon, Dec 28, 2020 at 10:25:08AM +0100, Bastien Durel wrote: > > > > > > > Le lundi 28 d?cembre 2020 ? 09:17 +, Stuart Henderson a > > > > ?crit?: > > > > > > So hardware failure confirmed :/ Do you think I can change > > > > > > the RAM > > > > > > or > > > > > > it's more likely a CPU/Chipset failure ? > > > > > > > > > > > > Thanks, > > > > > > > > > > > If you have multiple sticks of RAM, try removing some. > > > > I have only one > > > trying to reaset it is worth a try. > > > > > > -Otto > > > > > or doing the eraser magick > > > > you clean the contacts (remove oxidation) of the RAM module (the > > side that > > sticks in the motherboard) by rubbing a pencil eraser on the > > contacts of the > > RAM module. > > > in my experience, all the RAM modules nowadays comes gold plated, so > no > need to use eraser on them. > just a piece of paper, to make sure there is no grease on the > contacts > Hello, For those interested, a new memory module made the box able to run again. (Neither cleaning or using eraser worked) Thanks, -- Bastien
Re: misc panics
On 12/28/20 12:18 PM, rgc wrote: > On Mon, Dec 28, 2020 at 10:39:56AM +0100, Otto Moerbeek wrote: >> On Mon, Dec 28, 2020 at 10:25:08AM +0100, Bastien Durel wrote: >> >>> Le lundi 28 d?cembre 2020 ? 09:17 +, Stuart Henderson a ?crit?: > So hardware failure confirmed :/ Do you think I can change the RAM > or > it's more likely a CPU/Chipset failure ? > > Thanks, > If you have multiple sticks of RAM, try removing some. >>> I have only one >> trying to reaset it is worth a try. >> >> -Otto >> > or doing the eraser magick > > you clean the contacts (remove oxidation) of the RAM module (the side that > sticks in the motherboard) by rubbing a pencil eraser on the contacts of the > RAM module. > in my experience, all the RAM modules nowadays comes gold plated, so no need to use eraser on them. just a piece of paper, to make sure there is no grease on the contacts
Re: misc panics
On Mon, Dec 28, 2020 at 10:39:56AM +0100, Otto Moerbeek wrote: > On Mon, Dec 28, 2020 at 10:25:08AM +0100, Bastien Durel wrote: > > > Le lundi 28 d?cembre 2020 ? 09:17 +, Stuart Henderson a ?crit?: > > > > So hardware failure confirmed :/ Do you think I can change the RAM > > > > or > > > > it's more likely a CPU/Chipset failure ? > > > > > > > > Thanks, > > > > > > > > > > If you have multiple sticks of RAM, try removing some. > > I have only one > > trying to reaset it is worth a try. > > -Otto > or doing the eraser magick you clean the contacts (remove oxidation) of the RAM module (the side that sticks in the motherboard) by rubbing a pencil eraser on the contacts of the RAM module. - rgc
Re: misc panics
On Mon, Dec 28, 2020 at 10:25:08AM +0100, Bastien Durel wrote: > Le lundi 28 décembre 2020 à 09:17 +, Stuart Henderson a écrit : > > > So hardware failure confirmed :/ Do you think I can change the RAM > > > or > > > it's more likely a CPU/Chipset failure ? > > > > > > Thanks, > > > > > > > If you have multiple sticks of RAM, try removing some. > I have only one trying to reaset it is worth a try. -Otto
Re: misc panics
Le lundi 28 décembre 2020 à 09:17 +, Stuart Henderson a écrit : > > So hardware failure confirmed :/ Do you think I can change the RAM > > or > > it's more likely a CPU/Chipset failure ? > > > > Thanks, > > > > If you have multiple sticks of RAM, try removing some. I have only one -- Bastien
Re: misc panics
On 2020-12-28, Bastien Durel wrote: > Le lundi 28 décembre 2020 à 09:23 +1000, Stuart Longland a écrit : >> On 28/12/20 3:56 am, Bastien Durel wrote: >> > After that I got a (maybe) endless loop of panics inducing panics >> > (I did >> > not got the output, it was cycling fast), and after that the /bsd >> > file >> > was left empty : >> > >> > > > > OpenBSD/amd64 BOOT 3.52 >> > > boot> NOTE: random seed is being reused. >> > > booting hd0a:/bsd: read header >> > > failed(0). will try /bsd >> … >> > How can I figure out the cause of all these problems ? >> >> Seems awfully strange for `/bsd` to become zero-length out-of-the- >> blue. >> Got a `memtest86` disk handy? >> >> I'd be checking: >> - RAM >> - disks >> - CPU >> >> I think from the `dmesg` the storage device is a SSD? Could it be it >> has failed early? Some do that, and they give practically no warning >> when they do. > > SMART is OK on the disk > > I ran a memtest86 test, and got thousands of errors > > > Test Start Time 2020-12-28 08:38:08 > Elapsed Time 0:01:11 > Memory Range Tested 0x0 - 16F00 (5872MB) > CPU Selection ModeParallel (All CPUs) > ECC Polling Enabled > > Lowest Error Address 0x12AA18018 (4778MB) > Highest Error Address 0x12BFE7FF8 (4799MB) > Bits in Error MaskFF00 > Bits in Error 8 > Max Contiguous Errors 1 > > > > Test # Tests Passed Errors > Test 0 [Address test, walking ones, 1 CPU]1/1 (100%) 0 > Test 1 [Address test, own address, 1 CPU] 0/0 (0%)10988 > > > Last 10 Errors > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7FF8, > Expected: 00012BFE7FF8, Actual: 10012BFE7FF8 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7FE8, > Expected: 00012BFE7FE8, Actual: 04012BFE7FE8 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7F58, > Expected: 00012BFE7F58, Actual: 04012BFE7F58 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7F48, > Expected: 00012BFE7F48, Actual: 08012BFE7F48 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EF8, > Expected: 00012BFE7EF8, Actual: 40012BFE7EF8 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EE8, > Expected: 00012BFE7EE8, Actual: C0012BFE7EE8 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EC8, > Expected: 00012BFE7EC8, Actual: 04012BFE7EC8 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7E58, > Expected: 00012BFE7E58, Actual: 40012BFE7E58 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7D58, > Expected: 00012BFE7D58, Actual: 08012BFE7D58 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7D48, > Expected: 00012BFE7D48, Actual: 08012BFE7D48 > > > So hardware failure confirmed :/ Do you think I can change the RAM or > it's more likely a CPU/Chipset failure ? > > Thanks, > If you have multiple sticks of RAM, try removing some.
Re: misc panics
Le lundi 28 décembre 2020 à 09:23 +1000, Stuart Longland a écrit : > On 28/12/20 3:56 am, Bastien Durel wrote: > > After that I got a (maybe) endless loop of panics inducing panics > > (I did > > not got the output, it was cycling fast), and after that the /bsd > > file > > was left empty : > > > > > > > OpenBSD/amd64 BOOT 3.52 > > > boot> NOTE: random seed is being reused. > > > booting hd0a:/bsd: read header > > > failed(0). will try /bsd > … > > How can I figure out the cause of all these problems ? > > Seems awfully strange for `/bsd` to become zero-length out-of-the- > blue. > Got a `memtest86` disk handy? > > I'd be checking: > - RAM > - disks > - CPU > > I think from the `dmesg` the storage device is a SSD? Could it be it > has failed early? Some do that, and they give practically no warning > when they do. SMART is OK on the disk I ran a memtest86 test, and got thousands of errors Test Start Time 2020-12-28 08:38:08 Elapsed Time0:01:11 Memory Range Tested 0x0 - 16F00 (5872MB) CPU Selection Mode Parallel (All CPUs) ECC Polling Enabled Lowest Error Address0x12AA18018 (4778MB) Highest Error Address 0x12BFE7FF8 (4799MB) Bits in Error Mask FF00 Bits in Error 8 Max Contiguous Errors 1 Test# Tests Passed Errors Test 0 [Address test, walking ones, 1 CPU] 1/1 (100%) 0 Test 1 [Address test, own address, 1 CPU] 0/0 (0%)10988 Last 10 Errors 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7FF8, Expected: 00012BFE7FF8, Actual: 10012BFE7FF8 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7FE8, Expected: 00012BFE7FE8, Actual: 04012BFE7FE8 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7F58, Expected: 00012BFE7F58, Actual: 04012BFE7F58 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7F48, Expected: 00012BFE7F48, Actual: 08012BFE7F48 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EF8, Expected: 00012BFE7EF8, Actual: 40012BFE7EF8 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EE8, Expected: 00012BFE7EE8, Actual: C0012BFE7EE8 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EC8, Expected: 00012BFE7EC8, Actual: 04012BFE7EC8 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7E58, Expected: 00012BFE7E58, Actual: 40012BFE7E58 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7D58, Expected: 00012BFE7D58, Actual: 08012BFE7D58 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7D48, Expected: 00012BFE7D48, Actual: 08012BFE7D48 So hardware failure confirmed :/ Do you think I can change the RAM or it's more likely a CPU/Chipset failure ? Thanks, -- Bastien Durel
Re: misc panics
On 2020-12-27, Stuart Longland wrote: > Seems awfully strange for `/bsd` to become zero-length out-of-the-blue. Not if it crashed at a bad point in "reorder_kernel". I would try GENERIC instead of GENERIC.MP to see if there's any change.
Re: misc panics
On 28/12/20 3:56 am, Bastien Durel wrote: After that I got a (maybe) endless loop of panics inducing panics (I did not got the output, it was cycling fast), and after that the /bsd file was left empty : OpenBSD/amd64 BOOT 3.52 boot> NOTE: random seed is being reused. booting hd0a:/bsd: read header failed(0). will try /bsd … How can I figure out the cause of all these problems ? Seems awfully strange for `/bsd` to become zero-length out-of-the-blue. Got a `memtest86` disk handy? I'd be checking: - RAM - disks - CPU I think from the `dmesg` the storage device is a SSD? Could it be it has failed early? Some do that, and they give practically no warning when they do. -- Stuart Longland (aka Redhatter, VK4MSL) I haven't lost my mind... ...it's backed up on a tape somewhere.
misc panics
Hello, While in vacation, my home router crashed. I power-cycled it as soon as I returned, as console was unresponsive (I access it via serial console, so lines in following traces may be truncated) I then crashed a few seconds after finished boot : OpenBSD/amd64 (fremen.geekwu.org) (tty00) login: panic: pool_do_get: art_table free list modified: page 0xfd812b0ac000; item addr 0xfdc Starting stack trace... panic(81e7958a) at panic+0x11d pool_do_get(82196790,a,8000339a7f14) at pool_do_get+0x321 pool_get(82196790,a) at pool_get+0x8f art_table_get(80294480,fd812b0ac1a8,15) at art_table_get+0x7d art_insert(80294480,fd812b2f2480,8193c908,2d) at art_insert+0xe3 rtable_insert(0,8193c900,880213a0,88021380,30,fd812b3cc738) at rtable_inc rtrequest(1,8000339a8298,30,8000339a8210,0) at rtrequest+0x4fe rtm_output(88021300,8000339a8340,8000339a8298,30,0) at rtm_output+0x526 route_output(fd80784e0b00,fd814a2f7b30,0,0) at route_output+0x396 route_usrreq(fd814a2f7b30,9,fd80784e0b00,0,0,8000227e4290) at route_usrreq+0x21a sosend(fd814a2f7b30,0,8000339a8610,0,0,80) at sosend+0x36c dofilewritev(8000227e4290,6,8000339a8610,0,8000339a8710) at dofilewritev+0x14d sys_writev(8000227e4290,8000339a86b0,8000339a8710) at sys_writev+0xe2 syscall(8000339a8780) at syscall+0x389 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7d5530, count: 242 End of stack trace. syncing disks...18 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 giving up After I unplugged all but one ethernet cable, I had time to log in and enter a few commands (including syspatch, that only upgraded smtpd) before another crash : fremen# ospfctl sh ne ID Pri StateDeadTime Address Iface Uptime 51.255.165.194 1 DOWN/DOWN00:02:05 10.120.0.58 wg3 - 151.80.16.138 1 FULL/DR 00:00:17 10.120.0.2 tap0 00:01:30 10.42.42.21 1 2-WAY/WAIT 00:00:39 10.42.42.21 em1 - fremen# fatal protection fault in supervisor mode trap type 4 code 0 rip 81131c89 cs 8 rflags 10282 cr2 207673f0010 cpl 0 rsp 800022717250 gsbase 0x8211cff0 kgsbase 0x0 panic: trap type 4, code=0, pc=81131c89 Starting stack trace... panic(81de4ae4,81de4ae4,8000227171a0,4,800022717268,821e3d18) at pand kerntrap(8000227171a0,8000227171a0,8b36cc049b52fe43,f7fffd812f03fdd0,fd812f03ff90,6148604 alltraps_kern_meltdown(4,6148601dcf42cca0,800022717264,0,f7fffd812f03fdd0,fd812f03ff90) at ab pool_p_free(821e3d18,fd812f03ff90,3c3fbd5ca1bd0af6,fd812f03ff90,821e3d18,fff9 pool_put(821e3d18,fd812a7379d0,fa2dd1afa1e566dc,0,fd812a7379d0,0) at pool_put+0x16b art_gc(0,0,cf984c543a977bcb,82114d50,82114d38,1) at art_gc+0x71 taskq_thread(82114d38,82114d38,0,0,82114d38,81be29b0) at taskq_threa1 end trace frame: 0x0, count: 250 End of stack trace. After that I got a (maybe) endless loop of panics inducing panics (I did not got the output, it was cycling fast), and after that the /bsd file was left empty : OpenBSD/amd64 BOOT 3.52 boot> NOTE: random seed is being reused. booting hd0a:/bsd: read header failed(0). will try /bsd I booted /bsd.booted and got another crash at the end of boot sequence : boot> boot /bsd.booted NOTE: random seed is being reused. booting hd0a:/bsd.booted: 14415144+3195920+344096+0+880640 [497303+128+1138200+861220]=0x145ad58 entry point at 0x81001000 0 ff810010[ using 2497880 bytes of bsd ELF symbol table ] Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2020 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 6.8 (GENERIC.MP) #2: Sat Dec 5 07:17:48 MST 2020 r...@syspatch-68-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 4196302848 (4001MB) avail mem = 4054564864 (3866MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x8ce22000 (85 entries) bios0: vendor American Megatrends Inc. version "5.12" date 11/23/2018 bios0: Default string Default string acpi0 at bios0: ACPI 6.0 acpi0: sleep states S0 S3 S5 acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET SSDT SSDT UEFI SSDT LPIT SSDT SSDT SSDT ST acpi0: wakeup devices RP09(S3) PXSX(S3) RP10(S3) PXSX(S3) RP11(S3) PXSX(S3) RP12(S3) PXSX(S3) RP13(S] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Celeron(R) CPU 3855U @ 1.60GHz, 1596.81 MHz, 06-4e-03 cpu0: