Re: misc panics
Le lundi 28 décembre 2020 à 12:34 +0200, Gregory Edigarov a écrit : > On 12/28/20 12:18 PM, rgc wrote: > > On Mon, Dec 28, 2020 at 10:39:56AM +0100, Otto Moerbeek wrote: > > > On Mon, Dec 28, 2020 at 10:25:08AM +0100, Bastien Durel wrote: > > > > > > > Le lundi 28 d?cembre 2020 ? 09:17 +, Stuart Henderson a > > > > ?crit?: > > > > > > So hardware failure confirmed :/ Do you think I can change > > > > > > the RAM > > > > > > or > > > > > > it's more likely a CPU/Chipset failure ? > > > > > > > > > > > > Thanks, > > > > > > > > > > > If you have multiple sticks of RAM, try removing some. > > > > I have only one > > > trying to reaset it is worth a try. > > > > > > -Otto > > > > > or doing the eraser magick > > > > you clean the contacts (remove oxidation) of the RAM module (the > > side that > > sticks in the motherboard) by rubbing a pencil eraser on the > > contacts of the > > RAM module. > > > in my experience, all the RAM modules nowadays comes gold plated, so > no > need to use eraser on them. > just a piece of paper, to make sure there is no grease on the > contacts > Hello, For those interested, a new memory module made the box able to run again. (Neither cleaning or using eraser worked) Thanks, -- Bastien
Re: misc panics
On 12/28/20 12:18 PM, rgc wrote: > On Mon, Dec 28, 2020 at 10:39:56AM +0100, Otto Moerbeek wrote: >> On Mon, Dec 28, 2020 at 10:25:08AM +0100, Bastien Durel wrote: >> >>> Le lundi 28 d?cembre 2020 ? 09:17 +, Stuart Henderson a ?crit?: > So hardware failure confirmed :/ Do you think I can change the RAM > or > it's more likely a CPU/Chipset failure ? > > Thanks, > If you have multiple sticks of RAM, try removing some. >>> I have only one >> trying to reaset it is worth a try. >> >> -Otto >> > or doing the eraser magick > > you clean the contacts (remove oxidation) of the RAM module (the side that > sticks in the motherboard) by rubbing a pencil eraser on the contacts of the > RAM module. > in my experience, all the RAM modules nowadays comes gold plated, so no need to use eraser on them. just a piece of paper, to make sure there is no grease on the contacts
Re: misc panics
On Mon, Dec 28, 2020 at 10:39:56AM +0100, Otto Moerbeek wrote: > On Mon, Dec 28, 2020 at 10:25:08AM +0100, Bastien Durel wrote: > > > Le lundi 28 d?cembre 2020 ? 09:17 +, Stuart Henderson a ?crit?: > > > > So hardware failure confirmed :/ Do you think I can change the RAM > > > > or > > > > it's more likely a CPU/Chipset failure ? > > > > > > > > Thanks, > > > > > > > > > > If you have multiple sticks of RAM, try removing some. > > I have only one > > trying to reaset it is worth a try. > > -Otto > or doing the eraser magick you clean the contacts (remove oxidation) of the RAM module (the side that sticks in the motherboard) by rubbing a pencil eraser on the contacts of the RAM module. - rgc
Re: misc panics
On Mon, Dec 28, 2020 at 10:25:08AM +0100, Bastien Durel wrote: > Le lundi 28 décembre 2020 à 09:17 +, Stuart Henderson a écrit : > > > So hardware failure confirmed :/ Do you think I can change the RAM > > > or > > > it's more likely a CPU/Chipset failure ? > > > > > > Thanks, > > > > > > > If you have multiple sticks of RAM, try removing some. > I have only one trying to reaset it is worth a try. -Otto
Re: misc panics
Le lundi 28 décembre 2020 à 09:17 +, Stuart Henderson a écrit : > > So hardware failure confirmed :/ Do you think I can change the RAM > > or > > it's more likely a CPU/Chipset failure ? > > > > Thanks, > > > > If you have multiple sticks of RAM, try removing some. I have only one -- Bastien
Re: misc panics
On 2020-12-28, Bastien Durel wrote: > Le lundi 28 décembre 2020 à 09:23 +1000, Stuart Longland a écrit : >> On 28/12/20 3:56 am, Bastien Durel wrote: >> > After that I got a (maybe) endless loop of panics inducing panics >> > (I did >> > not got the output, it was cycling fast), and after that the /bsd >> > file >> > was left empty : >> > >> > > > > OpenBSD/amd64 BOOT 3.52 >> > > boot> NOTE: random seed is being reused. >> > > booting hd0a:/bsd: read header >> > > failed(0). will try /bsd >> … >> > How can I figure out the cause of all these problems ? >> >> Seems awfully strange for `/bsd` to become zero-length out-of-the- >> blue. >> Got a `memtest86` disk handy? >> >> I'd be checking: >> - RAM >> - disks >> - CPU >> >> I think from the `dmesg` the storage device is a SSD? Could it be it >> has failed early? Some do that, and they give practically no warning >> when they do. > > SMART is OK on the disk > > I ran a memtest86 test, and got thousands of errors > > > Test Start Time 2020-12-28 08:38:08 > Elapsed Time 0:01:11 > Memory Range Tested 0x0 - 16F00 (5872MB) > CPU Selection ModeParallel (All CPUs) > ECC Polling Enabled > > Lowest Error Address 0x12AA18018 (4778MB) > Highest Error Address 0x12BFE7FF8 (4799MB) > Bits in Error MaskFF00 > Bits in Error 8 > Max Contiguous Errors 1 > > > > Test # Tests Passed Errors > Test 0 [Address test, walking ones, 1 CPU]1/1 (100%) 0 > Test 1 [Address test, own address, 1 CPU] 0/0 (0%)10988 > > > Last 10 Errors > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7FF8, > Expected: 00012BFE7FF8, Actual: 10012BFE7FF8 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7FE8, > Expected: 00012BFE7FE8, Actual: 04012BFE7FE8 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7F58, > Expected: 00012BFE7F58, Actual: 04012BFE7F58 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7F48, > Expected: 00012BFE7F48, Actual: 08012BFE7F48 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EF8, > Expected: 00012BFE7EF8, Actual: 40012BFE7EF8 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EE8, > Expected: 00012BFE7EE8, Actual: C0012BFE7EE8 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EC8, > Expected: 00012BFE7EC8, Actual: 04012BFE7EC8 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7E58, > Expected: 00012BFE7E58, Actual: 40012BFE7E58 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7D58, > Expected: 00012BFE7D58, Actual: 08012BFE7D58 > 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7D48, > Expected: 00012BFE7D48, Actual: 08012BFE7D48 > > > So hardware failure confirmed :/ Do you think I can change the RAM or > it's more likely a CPU/Chipset failure ? > > Thanks, > If you have multiple sticks of RAM, try removing some.
Re: misc panics
Le lundi 28 décembre 2020 à 09:23 +1000, Stuart Longland a écrit : > On 28/12/20 3:56 am, Bastien Durel wrote: > > After that I got a (maybe) endless loop of panics inducing panics > > (I did > > not got the output, it was cycling fast), and after that the /bsd > > file > > was left empty : > > > > > > > OpenBSD/amd64 BOOT 3.52 > > > boot> NOTE: random seed is being reused. > > > booting hd0a:/bsd: read header > > > failed(0). will try /bsd > … > > How can I figure out the cause of all these problems ? > > Seems awfully strange for `/bsd` to become zero-length out-of-the- > blue. > Got a `memtest86` disk handy? > > I'd be checking: > - RAM > - disks > - CPU > > I think from the `dmesg` the storage device is a SSD? Could it be it > has failed early? Some do that, and they give practically no warning > when they do. SMART is OK on the disk I ran a memtest86 test, and got thousands of errors Test Start Time 2020-12-28 08:38:08 Elapsed Time0:01:11 Memory Range Tested 0x0 - 16F00 (5872MB) CPU Selection Mode Parallel (All CPUs) ECC Polling Enabled Lowest Error Address0x12AA18018 (4778MB) Highest Error Address 0x12BFE7FF8 (4799MB) Bits in Error Mask FF00 Bits in Error 8 Max Contiguous Errors 1 Test# Tests Passed Errors Test 0 [Address test, walking ones, 1 CPU] 1/1 (100%) 0 Test 1 [Address test, own address, 1 CPU] 0/0 (0%)10988 Last 10 Errors 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7FF8, Expected: 00012BFE7FF8, Actual: 10012BFE7FF8 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7FE8, Expected: 00012BFE7FE8, Actual: 04012BFE7FE8 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7F58, Expected: 00012BFE7F58, Actual: 04012BFE7F58 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7F48, Expected: 00012BFE7F48, Actual: 08012BFE7F48 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EF8, Expected: 00012BFE7EF8, Actual: 40012BFE7EF8 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EE8, Expected: 00012BFE7EE8, Actual: C0012BFE7EE8 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7EC8, Expected: 00012BFE7EC8, Actual: 04012BFE7EC8 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7E58, Expected: 00012BFE7E58, Actual: 40012BFE7E58 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7D58, Expected: 00012BFE7D58, Actual: 08012BFE7D58 2020-12-28 08:39:19 - [Data Error] Test: 1, CPU: 0, Address: 12BFE7D48, Expected: 00012BFE7D48, Actual: 08012BFE7D48 So hardware failure confirmed :/ Do you think I can change the RAM or it's more likely a CPU/Chipset failure ? Thanks, -- Bastien Durel
Re: misc panics
On 2020-12-27, Stuart Longland wrote: > Seems awfully strange for `/bsd` to become zero-length out-of-the-blue. Not if it crashed at a bad point in "reorder_kernel". I would try GENERIC instead of GENERIC.MP to see if there's any change.
Re: misc panics
On 28/12/20 3:56 am, Bastien Durel wrote: After that I got a (maybe) endless loop of panics inducing panics (I did not got the output, it was cycling fast), and after that the /bsd file was left empty : OpenBSD/amd64 BOOT 3.52 boot> NOTE: random seed is being reused. booting hd0a:/bsd: read header failed(0). will try /bsd … How can I figure out the cause of all these problems ? Seems awfully strange for `/bsd` to become zero-length out-of-the-blue. Got a `memtest86` disk handy? I'd be checking: - RAM - disks - CPU I think from the `dmesg` the storage device is a SSD? Could it be it has failed early? Some do that, and they give practically no warning when they do. -- Stuart Longland (aka Redhatter, VK4MSL) I haven't lost my mind... ...it's backed up on a tape somewhere.