I can confirm the issue is still there on kernel 6.8... Bios upgrade is done. The nvme is on the latest firmware.
When this happens the systems has noticable lag. This system is part of a Ceph cluster. The lag is sufficient to let the Ceph cluster fail services like the the mon on that host... uname -a: Linux cc003 6.8.0-51-generic #52-Ubuntu SMP PREEMPT_DYNAMIC Thu Dec 5 13:09:44 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux syslog: 2024-12-26T09:49:18.119398+00:00 cc003 kernel: nvme nvme0: I/O tag 164 (00a4) opcode 0x1 (I/O Cmd) QID 1 timeout, aborting req_op:WRITE(1) size:4096 2024-12-26T09:49:18.119419+00:00 cc003 kernel: nvme nvme0: I/O tag 801 (6321) opcode 0x1 (I/O Cmd) QID 2 timeout, aborting req_op:WRITE(1) size:4096 2024-12-26T09:49:18.255387+00:00 cc003 kernel: nvme nvme0: I/O tag 165 (80a5) opcode 0x1 (I/O Cmd) QID 1 timeout, aborting req_op:WRITE(1) size:4096 2024-12-26T09:49:19.079413+00:00 cc003 kernel: nvme nvme0: I/O tag 166 (60a6) opcode 0x1 (I/O Cmd) QID 1 timeout, aborting req_op:WRITE(1) size:36864 2024-12-26T09:49:19.079434+00:00 cc003 kernel: nvme nvme0: I/O tag 167 (d0a7) opcode 0x1 (I/O Cmd) QID 1 timeout, aborting req_op:WRITE(1) size:4096 2024-12-26T09:49:19.118394+00:00 cc003 kernel: nvme nvme0: I/O tag 802 (f322) opcode 0x1 (I/O Cmd) QID 2 timeout, aborting req_op:WRITE(1) size:16384 2024-12-26T09:49:19.118413+00:00 cc003 kernel: nvme nvme0: I/O tag 803 (f323) opcode 0x1 (I/O Cmd) QID 2 timeout, aborting req_op:WRITE(1) size:4096 2024-12-26T09:49:19.125373+00:00 cc003 kernel: nvme nvme0: I/O tag 804 (4324) opcode 0x1 (I/O Cmd) QID 2 timeout, aborting req_op:WRITE(1) size:131072 2024-12-26T09:49:48.135387+00:00 cc003 kernel: nvme nvme0: I/O tag 164 (00a4) opcode 0x1 (I/O Cmd) QID 1 timeout, reset controller 2024-12-26T09:49:48.147349+00:00 cc003 kernel: nvme nvme0: Abort status: 0x371 2024-12-26T09:49:48.147385+00:00 cc003 kernel: message repeated 7 times: [ nvme nvme0: Abort status: 0x371] 2024-12-26T09:49:48.167323+00:00 cc003 kernel: nvme nvme0: Shutdown timeout set to 10 seconds 2024-12-26T09:49:48.169931+00:00 cc003 kernel: nvme nvme0: 2/0/0 default/read/poll queues -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1991291 Title: "nvme nvme0: Abort status: 0x0" / "nvme nvme0: I/O 14 QID 2 timeout, aborting" To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+bug/1991291/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
