Simon Burge wrote: > Izumi Tsutsui wrote: > > > [ ... ] > > > > I have similar problems, as filed in PR/54790. > > https://gnats.netbsd.org/54790 > > > > No "ahcisata0 port 1: device present, speed: 3.0Gb/s" > > but many soft errors. No error on NetBSD 8.1 GENERIC. > > > > The problem here doesn't happen once after disabling NCQ by > > "hw.wd2.use_ncq=0" in /etc/sysctl.conf. > > > > One interesting point: > > > > Yours are > > > wd0: <Samsung SSD 860 EVO 1TB> > > [ ... ] > > > > Mine is also Samsung SSD 860 EVO: > > [ ... ] > > > > No errors on the following drives on my other machine: > > > wd0 at atabus0 drive 0 > > > wd0: <TS1TSSD230S> > > [ ... ] > > > > There might be some device quirk around NCQ of Samsung SSD 860 EVO? > > Sounds likely? [ ... ]
A bit more digging shows that this seems to be a (somewhat) known problem with Samsung EVO 860 disks and AMD SB710/750 chipsets. The problem also occurs on Windows and Linux with these drives and chipsets. Here's a couple of links: https://eu.community.samsung.com/t5/Cameras-IT-Everything-Else/860-EVO-250GB-causing-freezes-on-AMD-system/td-p/575813 https://bugzilla.kernel.org/show_bug.cgi?id=201693 In the first, there's discussion of how Samsung haven't released a firmware update to address this. Jaromir - do you have any suggestions on how we can deal with this, in particular thinking about the soon to be released NetBSD 9.0? Maybe somehow detecting this combination of disks and chipsets and disabling NCQ? My concern right now is that people upgrading from NetBSD 8 or earlier to NetBSD 9 may experience data corruption. For now, I'm running with this in /etc/rc.d/sysctl0 (should be run before fsck of /): #!/bin/sh # # BEFORE: fsck_root $_rc_subr_loaded . /etc/rc.subr name="sysctl0" start_cmd="sysctl0_start" stop_cmd=":" sysctl0_start() { for d in $(dmesg -t | awk -F: '/^wd.*860 EVO/ { print $1 }'); do sysctl -w hw.${d}.use_ncq=0 done } load_rc_config $name run_rc_command "$1" Cheers, Simon.