Re: misc panics

2020-12-31 Thread Bastien Durel
Le lundi 28 décembre 2020 à 12:34 +0200, Gregory Edigarov a écrit :
> On 12/28/20 12:18 PM, rgc wrote:
> > On Mon, Dec 28, 2020 at 10:39:56AM +0100, Otto Moerbeek wrote:
> > > On Mon, Dec 28, 2020 at 10:25:08AM +0100, Bastien Durel wrote:
> > > 
> > > > Le lundi 28 d?cembre 2020 ? 09:17 +, Stuart Henderson a
> > > > ?crit?:
> > > > > > So hardware failure confirmed :/ Do you think I can change
> > > > > > the RAM
> > > > > > or
> > > > > > it's more likely a CPU/Chipset failure ?
> > > > > > 
> > > > > > Thanks,
> > > > > > 
> > > > > If you have multiple sticks of RAM, try removing some.
> > > > I have only one
> > > trying to reaset it is worth a try.
> > > 
> > > -Otto
> > > 
> > or doing the eraser magick
> > 
> > you clean the contacts (remove oxidation) of the RAM module (the
> > side that
> > sticks in the motherboard) by rubbing a pencil eraser on the
> > contacts of the
> > RAM module.
> > 
> in my experience, all the RAM modules nowadays comes gold plated, so
> no
> need to use eraser on them.
> just a piece of paper, to make sure there is no grease on the
> contacts
> 
Hello,

For those interested, a new memory module made the box able to run
again. (Neither cleaning or using eraser worked)

Thanks,

-- 
Bastien



Re: misc panics

2020-12-28 Thread Gregory Edigarov



On 12/28/20 12:18 PM, rgc wrote:
> On Mon, Dec 28, 2020 at 10:39:56AM +0100, Otto Moerbeek wrote:
>> On Mon, Dec 28, 2020 at 10:25:08AM +0100, Bastien Durel wrote:
>>
>>> Le lundi 28 d?cembre 2020 ? 09:17 +, Stuart Henderson a ?crit?:
> So hardware failure confirmed :/ Do you think I can change the RAM
> or
> it's more likely a CPU/Chipset failure ?
>
> Thanks,
>
 If you have multiple sticks of RAM, try removing some.
>>> I have only one
>> trying to reaset it is worth a try.
>>
>>  -Otto
>>
> or doing the eraser magick
>
> you clean the contacts (remove oxidation) of the RAM module (the side that
> sticks in the motherboard) by rubbing a pencil eraser on the contacts of the
> RAM module.
>
in my experience, all the RAM modules nowadays comes gold plated, so no
need to use eraser on them.
just a piece of paper, to make sure there is no grease on the contacts



Re: misc panics

2020-12-28 Thread rgc
On Mon, Dec 28, 2020 at 10:39:56AM +0100, Otto Moerbeek wrote:
> On Mon, Dec 28, 2020 at 10:25:08AM +0100, Bastien Durel wrote:
> 
> > Le lundi 28 d?cembre 2020 ? 09:17 +, Stuart Henderson a ?crit?:
> > > > So hardware failure confirmed :/ Do you think I can change the RAM
> > > > or
> > > > it's more likely a CPU/Chipset failure ?
> > > > 
> > > > Thanks,
> > > > 
> > > 
> > > If you have multiple sticks of RAM, try removing some.
> > I have only one
> 
> trying to reaset it is worth a try.
> 
>   -Otto
> 

or doing the eraser magick

you clean the contacts (remove oxidation) of the RAM module (the side that
sticks in the motherboard) by rubbing a pencil eraser on the contacts of the
RAM module.

- rgc



Re: misc panics

2020-12-28 Thread Otto Moerbeek
On Mon, Dec 28, 2020 at 10:25:08AM +0100, Bastien Durel wrote:

> Le lundi 28 décembre 2020 à 09:17 +, Stuart Henderson a écrit :
> > > So hardware failure confirmed :/ Do you think I can change the RAM
> > > or
> > > it's more likely a CPU/Chipset failure ?
> > > 
> > > Thanks,
> > > 
> > 
> > If you have multiple sticks of RAM, try removing some.
> I have only one

trying to reaset it is worth a try.

-Otto



Re: misc panics

2020-12-28 Thread Bastien Durel
Le lundi 28 décembre 2020 à 09:17 +, Stuart Henderson a écrit :
> > So hardware failure confirmed :/ Do you think I can change the RAM
> > or
> > it's more likely a CPU/Chipset failure ?
> > 
> > Thanks,
> > 
> 
> If you have multiple sticks of RAM, try removing some.
I have only one

-- 
Bastien



Re: misc panics

2020-12-28 Thread Stuart Henderson
On 2020-12-28, Bastien Durel  wrote:
> Le lundi 28 décembre 2020 à 09:23 +1000, Stuart Longland a écrit :
>> On 28/12/20 3:56 am, Bastien Durel wrote:
>> > After that I got a (maybe) endless loop of panics inducing panics
>> > (I did 
>> > not got the output, it was cycling fast), and after that the /bsd
>> > file 
>> > was left empty :
>> > 
>> > > > > OpenBSD/amd64 BOOT 3.52
>> > > boot> NOTE: random seed is being reused.
>> > > booting hd0a:/bsd: read header
>> > >  failed(0). will try /bsd
>> …
>> > How can I figure out the cause of all these problems ?
>> 
>> Seems awfully strange for `/bsd` to become zero-length out-of-the-
>> blue. 
>>   Got a `memtest86` disk handy?
>> 
>> I'd be checking:
>> - RAM
>> - disks
>> - CPU
>> 
>> I think from the `dmesg` the storage device is a SSD?  Could it be it
>> has failed early?  Some do that, and they give practically no warning
>> when they do.
>
> SMART is OK on the disk
>
> I ran a memtest86 test, and got thousands of errors
>
>
> Test Start Time   2020-12-28 08:38:08
> Elapsed Time  0:01:11
> Memory Range Tested   0x0 - 16F00 (5872MB)
> CPU Selection ModeParallel (All CPUs)
> ECC Polling   Enabled
>
> Lowest Error Address  0x12AA18018 (4778MB)
> Highest Error Address 0x12BFE7FF8 (4799MB)
> Bits in Error MaskFF00
> Bits in Error 8
> Max Contiguous Errors 1
>
>
>
> Test  # Tests Passed  Errors
> Test 0 [Address test, walking ones, 1 CPU]1/1 (100%)  0
> Test 1 [Address test, own address, 1 CPU] 0/0 (0%)10988
>
>
> Last 10 Errors
> 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7FF8,
> Expected: 00012BFE7FF8, Actual: 10012BFE7FF8
> 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7FE8,
> Expected: 00012BFE7FE8, Actual: 04012BFE7FE8
> 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7F58,
> Expected: 00012BFE7F58, Actual: 04012BFE7F58
> 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7F48,
> Expected: 00012BFE7F48, Actual: 08012BFE7F48
> 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EF8,
> Expected: 00012BFE7EF8, Actual: 40012BFE7EF8
> 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EE8,
> Expected: 00012BFE7EE8, Actual: C0012BFE7EE8
> 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EC8,
> Expected: 00012BFE7EC8, Actual: 04012BFE7EC8
> 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7E58,
> Expected: 00012BFE7E58, Actual: 40012BFE7E58
> 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7D58,
> Expected: 00012BFE7D58, Actual: 08012BFE7D58
> 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7D48,
> Expected: 00012BFE7D48, Actual: 08012BFE7D48
>
>
> So hardware failure confirmed :/ Do you think I can change the RAM or
> it's more likely a CPU/Chipset failure ?
>
> Thanks,
>

If you have multiple sticks of RAM, try removing some.




Re: misc panics

2020-12-28 Thread Bastien Durel
Le lundi 28 décembre 2020 à 09:23 +1000, Stuart Longland a écrit :
> On 28/12/20 3:56 am, Bastien Durel wrote:
> > After that I got a (maybe) endless loop of panics inducing panics
> > (I did 
> > not got the output, it was cycling fast), and after that the /bsd
> > file 
> > was left empty :
> > 
> > > > > OpenBSD/amd64 BOOT 3.52
> > > boot> NOTE: random seed is being reused.
> > > booting hd0a:/bsd: read header
> > >  failed(0). will try /bsd
> …
> > How can I figure out the cause of all these problems ?
> 
> Seems awfully strange for `/bsd` to become zero-length out-of-the-
> blue. 
>   Got a `memtest86` disk handy?
> 
> I'd be checking:
> - RAM
> - disks
> - CPU
> 
> I think from the `dmesg` the storage device is a SSD?  Could it be it
> has failed early?  Some do that, and they give practically no warning
> when they do.

SMART is OK on the disk

I ran a memtest86 test, and got thousands of errors


Test Start Time 2020-12-28 08:38:08
Elapsed Time0:01:11
Memory Range Tested 0x0 - 16F00 (5872MB)
CPU Selection Mode  Parallel (All CPUs)
ECC Polling Enabled

Lowest Error Address0x12AA18018 (4778MB)
Highest Error Address   0x12BFE7FF8 (4799MB)
Bits in Error Mask  FF00
Bits in Error   8
Max Contiguous Errors   1



Test# Tests Passed  Errors
Test 0 [Address test, walking ones, 1 CPU]  1/1 (100%)  0
Test 1 [Address test, own address, 1 CPU]   0/0 (0%)10988


Last 10 Errors
2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7FF8,
Expected: 00012BFE7FF8, Actual: 10012BFE7FF8
2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7FE8,
Expected: 00012BFE7FE8, Actual: 04012BFE7FE8
2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7F58,
Expected: 00012BFE7F58, Actual: 04012BFE7F58
2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7F48,
Expected: 00012BFE7F48, Actual: 08012BFE7F48
2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EF8,
Expected: 00012BFE7EF8, Actual: 40012BFE7EF8
2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EE8,
Expected: 00012BFE7EE8, Actual: C0012BFE7EE8
2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EC8,
Expected: 00012BFE7EC8, Actual: 04012BFE7EC8
2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7E58,
Expected: 00012BFE7E58, Actual: 40012BFE7E58
2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7D58,
Expected: 00012BFE7D58, Actual: 08012BFE7D58
2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7D48,
Expected: 00012BFE7D48, Actual: 08012BFE7D48


So hardware failure confirmed :/ Do you think I can change the RAM or
it's more likely a CPU/Chipset failure ?

Thanks,

-- 
Bastien Durel





Re: misc panics

2020-12-27 Thread Stuart Henderson
On 2020-12-27, Stuart Longland  wrote:
> Seems awfully strange for `/bsd` to become zero-length out-of-the-blue. 

Not if it crashed at a bad point in "reorder_kernel".

I would try GENERIC instead of GENERIC.MP to see if there's any change.




Re: misc panics

2020-12-27 Thread Stuart Longland

On 28/12/20 3:56 am, Bastien Durel wrote:
After that I got a (maybe) endless loop of panics inducing panics (I did 
not got the output, it was cycling fast), and after that the /bsd file 
was left empty :



OpenBSD/amd64 BOOT 3.52

boot> NOTE: random seed is being reused.
booting hd0a:/bsd: read header
 failed(0). will try /bsd

…

How can I figure out the cause of all these problems ?


Seems awfully strange for `/bsd` to become zero-length out-of-the-blue. 
 Got a `memtest86` disk handy?


I'd be checking:
- RAM
- disks
- CPU

I think from the `dmesg` the storage device is a SSD?  Could it be it 
has failed early?  Some do that, and they give practically no warning 
when they do.

--
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.



misc panics

2020-12-27 Thread Bastien Durel

Hello,

While in vacation, my home router crashed. I power-cycled it as soon as 
I returned, as console was unresponsive (I access it via serial console, 
so lines in following traces may be truncated)


I then crashed a few seconds after finished boot :


OpenBSD/amd64 (fremen.geekwu.org) (tty00)

login: panic: pool_do_get: art_table free list modified: page 
0xfd812b0ac000; item addr 0xfdc
Starting stack trace...
panic(81e7958a) at panic+0x11d
pool_do_get(82196790,a,8000339a7f14) at pool_do_get+0x321
pool_get(82196790,a) at pool_get+0x8f
art_table_get(80294480,fd812b0ac1a8,15) at art_table_get+0x7d
art_insert(80294480,fd812b2f2480,8193c908,2d) at 
art_insert+0xe3
rtable_insert(0,8193c900,880213a0,88021380,30,fd812b3cc738)
 at rtable_inc
rtrequest(1,8000339a8298,30,8000339a8210,0) at rtrequest+0x4fe
rtm_output(88021300,8000339a8340,8000339a8298,30,0) at 
rtm_output+0x526
route_output(fd80784e0b00,fd814a2f7b30,0,0) at route_output+0x396
route_usrreq(fd814a2f7b30,9,fd80784e0b00,0,0,8000227e4290) at 
route_usrreq+0x21a
sosend(fd814a2f7b30,0,8000339a8610,0,0,80) at sosend+0x36c
dofilewritev(8000227e4290,6,8000339a8610,0,8000339a8710) at 
dofilewritev+0x14d
sys_writev(8000227e4290,8000339a86b0,8000339a8710) at 
sys_writev+0xe2
syscall(8000339a8780) at syscall+0x389
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7d5530, count: 242
End of stack trace.
syncing disks...18 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10  
giving up


After I unplugged all but one ethernet cable, I had time to log in and 
enter a few commands (including syspatch, that only upgraded smtpd) 
before another crash :


fremen# ospfctl sh ne 
ID  Pri StateDeadTime Address Iface Uptime

51.255.165.194  1   DOWN/DOWN00:02:05 10.120.0.58 wg3   -
151.80.16.138   1   FULL/DR  00:00:17 10.120.0.2  tap0  00:01:30
10.42.42.21 1   2-WAY/WAIT   00:00:39 10.42.42.21 em1   -
fremen# fatal protection fault in supervisor mode
trap type 4 code 0 rip 81131c89 cs 8 rflags 10282 cr2 207673f0010 cpl 0 
rsp 800022717250
gsbase 0x8211cff0  kgsbase 0x0
panic: trap type 4, code=0, pc=81131c89
Starting stack trace...
panic(81de4ae4,81de4ae4,8000227171a0,4,800022717268,821e3d18)
 at pand
kerntrap(8000227171a0,8000227171a0,8b36cc049b52fe43,f7fffd812f03fdd0,fd812f03ff90,6148604
alltraps_kern_meltdown(4,6148601dcf42cca0,800022717264,0,f7fffd812f03fdd0,fd812f03ff90)
 at ab
pool_p_free(821e3d18,fd812f03ff90,3c3fbd5ca1bd0af6,fd812f03ff90,821e3d18,fff9
pool_put(821e3d18,fd812a7379d0,fa2dd1afa1e566dc,0,fd812a7379d0,0)
 at pool_put+0x16b
art_gc(0,0,cf984c543a977bcb,82114d50,82114d38,1) at art_gc+0x71
taskq_thread(82114d38,82114d38,0,0,82114d38,81be29b0)
 at taskq_threa1
end trace frame: 0x0, count: 250
End of stack trace.


After that I got a (maybe) endless loop of panics inducing panics (I did 
not got the output, it was cycling fast), and after that the /bsd file 
was left empty :



OpenBSD/amd64 BOOT 3.52
boot> 
NOTE: random seed is being reused.

booting hd0a:/bsd: read header
 failed(0). will try /bsd


I booted /bsd.booted and got another crash at the end of boot sequence :


boot> boot /bsd.booted
NOTE: random seed is being reused.
booting hd0a:/bsd.booted: 14415144+3195920+344096+0+880640 
[497303+128+1138200+861220]=0x145ad58
entry point at 0x81001000
0
ff810010[ using 2497880 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2020 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 6.8 (GENERIC.MP) #2: Sat Dec  5 07:17:48 MST 2020

r...@syspatch-68-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4196302848 (4001MB)
avail mem = 4054564864 (3866MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x8ce22000 (85 entries)
bios0: vendor American Megatrends Inc. version "5.12" date 11/23/2018
bios0: Default string Default string
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S3 S5
acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET SSDT SSDT UEFI SSDT 
LPIT SSDT SSDT SSDT ST
acpi0: wakeup devices RP09(S3) PXSX(S3) RP10(S3) PXSX(S3) RP11(S3) PXSX(S3) 
RP12(S3) PXSX(S3) RP13(S]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Celeron(R) CPU 3855U @ 1.60GHz, 1596.81 MHz, 06-4e-03
cpu0: