I have just encountered this bug with Ubuntu 23.04 with kernel
6.2.0-20-generic. System is using default settings.
NVME is a Samsung SSD 970 EVO Plus 2TB with latest 4B2QEXM7 firmware available
since it left factory.
Motherboard is Asus TUF GAMING X670E-PLUS WIFI with firmware 1410.
My issue happens only after an extended period of time, more than a week
+-day or two.
System turns to read-only and the last thing in journalctl -f I see
this:
touko 27 03:21:01 cereza kernel: nvme nvme0: I/O 657 (I/O Cmd) QID 14 timeout,
aborting
touko 27 03:21:01 cereza kernel: nvme nvme0: Abort status: 0x0
touko 27 03:21:31 cereza kernel: nvme nvme0: I/O 657 (I/O Cmd) QID 14 timeout,
aborting
touko 27 03:21:31 cereza kernel: nvme nvme0: Abort status: 0x0
touko 27 03:21:35 cereza kernel: nvme nvme0: I/O 12 QID 0 timeout, reset
controller
I have now enabled nvme_core.default_ps_max_latency_us=1200 to see if
the issue appears again since this should disable the lowest power state
of the drive according to smartctl:
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 7.59W --0 0 0 00 0
1 + 7.59W --1 1 1 10 200
2 + 7.59W --2 2 2 201000
3 - 0.0500W --3 3 3 3 20001200
4 - 0.0050W --4 4 4 4 5009500
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866
Title:
nvme drive fails after some time
Status in linux package in Ubuntu:
Confirmed
Status in linux source package in Groovy:
Fix Released
Status in Debian:
New
Bug description:
Sorry for the vague title. I thought this was a hardware issue until
someone else online mentioned their nvme drive goes "read only" after
some time. I tend not to reboot my system much, so have a large
journal. Either way this happens once in a while. The / drive is fine,
but /home is on nvme which just disappears. I reboot and everything is
fine. But leave it long enough and it'll fail again.
Here's the most recent snippet about the nvme drive before I restarted
the system.
Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting
Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting
Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting
Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting
Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset
controller
Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset
controller
Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset,
CSTS=0x1
Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset,
CSTS=0x1
Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure
status: -19
Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more
than 120 seconds.
Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0 731 2 0x4000
Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset,
CSTS=0x1
Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1,
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical
block 240123967, lost async page write
Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1):
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading
directory lblock 0
Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1,
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical
block 240123917, lost async page write
Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1,
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1,
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 08 19:21:45 robot