Re: nvme controller reset failures on recent -CURRENT

2024-02-13 Thread Patrick M. Hausen
Hi all, > Am 13.02.2024 um 20:56 schrieb Pete Wright : > 1. M.2 nvme really does need proper cooling, much more so than traditional > SATA/SAS/SCSI drives. I recently found a tool named "Scrutiny" that presents a nice dashboard of all your disk devices and their SMART data including crucial

Re: nvme controller reset failures on recent -CURRENT

2024-02-13 Thread Craig Leres
I had issues with a nvme drive in an intel nuc. When I asked freebsd-hackers, overheating was the first guess: https://lists.freebsd.org/pipermail/freebsd-hackers/2018-May/052783.html I blew the dust out of the fan assembly and changed the bios fan settings to be more aggressive and the

Re: nvme controller reset failures on recent -CURRENT

2024-02-13 Thread Pete Wright
There's a tiny chance that this could be something more exotic, but my money is on hardware gone bad after 2 years of service. I don't think this is 'wear out' of the NAND (it's only 15TB written, but it could be if this drive is really really crappy nand: first generation QLC maybe, but it seems

Re: nvme controller reset failures on recent -CURRENT

2024-02-13 Thread Don Lewis
On 12 Feb, Warner Losh wrote: > On Mon, Feb 12, 2024 at 9:15 PM Don Lewis wrote: > >> On 12 Feb, Maxim Sobolev wrote: >> > Might be an overheating. Today's nvme drives are notoriously flaky if you >> > run them without proper heat sink attached to it. >> >> I don't think it is a thermal problem.

Re: nvme controller reset failures on recent -CURRENT

2024-02-12 Thread Warner Losh
On Mon, Feb 12, 2024 at 9:15 PM Don Lewis wrote: > On 12 Feb, Maxim Sobolev wrote: > > Might be an overheating. Today's nvme drives are notoriously flaky if you > > run them without proper heat sink attached to it. > > I don't think it is a thermal problem. According to the drive health > page,

Re: nvme controller reset failures on recent -CURRENT

2024-02-12 Thread Don Lewis
On 12 Feb, Maxim Sobolev wrote: > Might be an overheating. Today's nvme drives are notoriously flaky if you > run them without proper heat sink attached to it. I don't think it is a thermal problem. According to the drive health page, the device temperature has never reached Temperature 2,

Re: nvme controller reset failures on recent -CURRENT

2024-02-12 Thread Don Lewis
On 12 Feb, Mark Johnston wrote: > On Mon, Feb 12, 2024 at 04:28:10PM -0800, Don Lewis wrote: >> I just upgraded my package build machine to: >> FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e >> from: >> FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 >> and I've had two nvme-triggered

Re: nvme controller reset failures on recent -CURRENT

2024-02-12 Thread Mark Johnston
On Mon, Feb 12, 2024 at 04:28:10PM -0800, Don Lewis wrote: > I just upgraded my package build machine to: > FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e > from: > FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 > and I've had two nvme-triggered panics in the last day. > > nvme is

Re: nvme controller reset failures on recent -CURRENT

2024-02-12 Thread Maxim Sobolev
Might be an overheating. Today's nvme drives are notoriously flaky if you run them without proper heat sink attached to it. -Max On Mon, Feb 12, 2024, 4:28 PM Don Lewis wrote: > I just upgraded my package build machine to: > FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e > from: >

nvme controller reset failures on recent -CURRENT

2024-02-12 Thread Don Lewis
I just upgraded my package build machine to: FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e from: FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 and I've had two nvme-triggered panics in the last day. nvme is being used for swap and L2ARC. I'm not able to get a crash dump, probably