[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2023-07-21 Thread João Pedro Seara
I have implemented a similar workaround to @fatordee:

$ sudo smartctl -a /dev/nvme0
(...)
Supported Power States
St Op Max   Active Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 + 3.00W   --0  0  0  00   0
 1 + 2.60W   --1  1  1  10   0
 2 + 1.70W   --2  2  2  20   0
 3 -   0.0250W   --3  3  3  3 50009000
 4 -   0.0025W   --4  4  4  4 5000   44000
(...)

$ cat /etc/default/grub | grep latency
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash 
nvme_core.default_ps_max_latency_us=9000"

I used Ex_Lat from the state right before the last one, as per [1].

It's a less aggressive workaround, as this one just disables the lowest
power state, instead of them all.

Seems to be working pretty well.

[1]
https://wiki.archlinux.org/title/Solid_state_drive/NVMe#Controller_failure_due_to_broken_APST_support

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2023-05-27 Thread João Pedro Seara
@fatordee, please keep us updated.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   linux-backports-modules-5.8.0-34-generic  N/A
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2023-05-27 Thread faattori
I have just encountered this bug with Ubuntu 23.04 with kernel
6.2.0-20-generic. System is using default settings.

NVME is a Samsung SSD 970 EVO Plus 2TB with latest 4B2QEXM7 firmware available 
since it left factory.
Motherboard is Asus TUF GAMING X670E-PLUS WIFI with firmware 1410.

My issue happens only after an extended period of time, more than a week
+-day or two.

System turns to read-only and the last thing in journalctl -f I see
this:

touko 27 03:21:01 cereza kernel: nvme nvme0: I/O 657 (I/O Cmd) QID 14 timeout, 
aborting
touko 27 03:21:01 cereza kernel: nvme nvme0: Abort status: 0x0
touko 27 03:21:31 cereza kernel: nvme nvme0: I/O 657 (I/O Cmd) QID 14 timeout, 
aborting
touko 27 03:21:31 cereza kernel: nvme nvme0: Abort status: 0x0
touko 27 03:21:35 cereza kernel: nvme nvme0: I/O 12 QID 0 timeout, reset 
controller

I have now enabled nvme_core.default_ps_max_latency_us=1200 to see if
the issue appears again since this should disable the lowest power state
of the drive according to smartctl:

Supported Power States
St Op Max   Active Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 + 7.59W   --0  0  0  00   0
 1 + 7.59W   --1  1  1  10 200
 2 + 7.59W   --2  2  2  201000
 3 -   0.0500W   --3  3  3  3 20001200
 4 -   0.0050W   --4  4  4  4  5009500

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2023-02-17 Thread João Pedro Seara
I spoke to soon. Problem still appears. : - )

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2023-02-15 Thread João Pedro Seara
Well, no occurrences since my last post on Jan 19. Seems that something
changed for the better.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2023-01-19 Thread João Pedro Seara
Pete,

Seems like in [1] it is suggested that in latest Kernel revisions, the
issue * may * be fixed. In that same page we can be linked to the commit
discussion, and it seems the problem fixed is only related to Kingston.

I will test the newest Kernel for Jammy (5.15.0-57) and report back.

JP

[1]
https://wiki.archlinux.org/title/Solid_state_drive/NVMe#Controller_failure_due_to_broken_APST_support

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2023-01-19 Thread Pete
Kernel 5.19.0-28 FAIL

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   linux-backports-modules-5.8.0-34-generic  N/A
   linux-firmware 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2023-01-19 Thread Pete
Hello,
same with me. I solved the issue with the nvme drive with kernel 5.19.0-28 also 
with the parameter:
GRUB_CMDLINE_LINUX_DEFAULT="nvme_core.default_ps_max_latency_us=0"

Kind regards,
Peter

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2022-10-30 Thread João Pedro Seara
Hello, all.

I have solved this issue for now by applying the following workaround:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash
nvme_core.default_ps_max_latency_us=0"

As per:
https://wiki.archlinux.org/title/Solid_state_drive/NVMe#Controller_failure_due_to_broken_APST_support

Thanks,
JP

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2022-09-02 Thread João Pedro Seara
Attaching also journalctl, which shows the problem happening several
times. One of them: set 01 01:20:02

** Attachment added: "journalctl.txt.gz"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1910866/+attachment/5613275/+files/journalctl.txt.gz

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2022-09-02 Thread João Pedro Seara
There you go, Kay-Heng. Thanks.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   linux-backports-modules-5.8.0-34-generic  N/A
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2022-09-02 Thread João Pedro Seara
Attaching dmesg.

** Attachment added: "dmesg.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1910866/+attachment/5613274/+files/dmesg.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2022-09-01 Thread Kai-Heng Feng
Can you please attach dmesg? Thanks!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   linux-backports-modules-5.8.0-34-generic  N/A
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2022-08-31 Thread João Pedro Seara
Hello Kai-Heng,

Unrelated to system sleep.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   linux-backports-modules-5.8.0-34-generic  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2022-08-31 Thread Kai-Heng Feng
João Pedro Seara,

Does this issue only happen after system sleep?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2022-08-31 Thread João Pedro Seara
Still observing the same I wrote in my comment above. Now upgraded to
5.15.0-46-generic.

This is very frustrating.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-10-12 Thread Joshua Sjoding
We have an ubuntu server running a set of eight Samsung 980 Pro PCIe 4.0
NVMe SSDs (model MZ-V8P1T0BW) on Ubuntu 20.04.3 LTS (GNU/Linux
5.4.0-88-generic x86_64). We've seen this happen at least 5 times over
the past month, and not always on the same SSD. We first saw it happen
on 5.4.0-81. Some samples from dmesg are below.

This is a production system that runs a set of virtual desktop
instances. Thankfully we use these in a zfs pool with four pairs of RAID
1 vdevs, so the only outage we've had so far is when it hit both members
of a mirrored pair. After a reboot the SSDs come back up.

[Mon Sep  6 12:58:36 2021] nvme nvme5: I/O 132 QID 46 timeout, aborting
[Mon Sep  6 12:58:37 2021] nvme nvme5: I/O 133 QID 46 timeout, aborting
[Mon Sep  6 12:58:39 2021] nvme nvme5: I/O 134 QID 46 timeout, aborting
[Mon Sep  6 12:58:40 2021] nvme nvme5: I/O 135 QID 46 timeout, aborting
[Mon Sep  6 12:58:40 2021] nvme nvme5: I/O 784 QID 48 timeout, aborting
[Mon Sep  6 12:58:41 2021] nvme nvme5: I/O 136 QID 46 timeout, aborting
[Mon Sep  6 12:58:41 2021] nvme nvme5: I/O 137 QID 46 timeout, aborting
[Mon Sep  6 12:58:42 2021] nvme nvme5: I/O 492 QID 28 timeout, aborting
[Mon Sep  6 12:59:07 2021] nvme nvme5: I/O 132 QID 46 timeout, reset controller
[Mon Sep  6 12:59:38 2021] nvme nvme5: I/O 24 QID 0 timeout, reset controller
[Mon Sep  6 13:00:29 2021] nvme nvme5: Device not ready; aborting reset
[Mon Sep  6 13:00:29 2021] nvme nvme5: Abort status: 0x371
[Mon Sep  6 13:00:29 2021] nvme nvme5: Abort status: 0x371
[Mon Sep  6 13:00:29 2021] nvme nvme5: Abort status: 0x371
[Mon Sep  6 13:00:29 2021] nvme nvme5: Abort status: 0x371
[Mon Sep  6 13:00:29 2021] nvme nvme5: Abort status: 0x371
[Mon Sep  6 13:00:29 2021] nvme nvme5: Abort status: 0x371
[Mon Sep  6 13:00:29 2021] nvme nvme5: Abort status: 0x371
[Mon Sep  6 13:00:29 2021] nvme nvme5: Abort status: 0x371
[Mon Sep  6 13:00:33 2021] INFO: task txg_quiesce:2172 blocked for more than 
120 seconds.
[Mon Sep  6 13:00:33 2021]   Tainted: P   OE 5.4.0-81-generic 
#91-Ubuntu

[Tue Sep 21 21:18:36 2021] nvme nvme2: I/O 175 QID 38 timeout, aborting
[Tue Sep 21 21:18:37 2021] nvme nvme2: I/O 240 QID 26 timeout, aborting
[Tue Sep 21 21:18:47 2021] nvme nvme2: I/O 718 QID 23 timeout, aborting
[Tue Sep 21 21:18:56 2021] nvme nvme2: I/O 719 QID 23 timeout, aborting
[Tue Sep 21 21:19:06 2021] nvme nvme2: I/O 175 QID 38 timeout, reset controller
[Tue Sep 21 21:19:37 2021] nvme nvme2: I/O 17 QID 0 timeout, reset controller
[Tue Sep 21 21:20:27 2021] nvme nvme2: Device not ready; aborting reset
[Tue Sep 21 21:20:27 2021] nvme nvme2: Abort status: 0x371
[Tue Sep 21 21:20:27 2021] nvme nvme2: Abort status: 0x371
[Tue Sep 21 21:20:27 2021] nvme nvme2: Abort status: 0x371
[Tue Sep 21 21:20:27 2021] nvme nvme2: Abort status: 0x371
[Tue Sep 21 21:20:47 2021] nvme nvme2: Device not ready; aborting reset
[Tue Sep 21 21:20:47 2021] nvme nvme2: Removing after probe failure status: -19
[Tue Sep 21 21:21:08 2021] nvme nvme2: Device not ready; aborting reset

[Tue Oct  5 16:54:59 2021] nvme nvme6: I/O 1013 QID 38 timeout, aborting
[Tue Oct  5 16:54:59 2021] nvme nvme6: I/O 727 QID 39 timeout, aborting
[Tue Oct  5 16:55:03 2021] nvme nvme6: I/O 1014 QID 38 timeout, aborting
[Tue Oct  5 16:55:05 2021] nvme nvme6: I/O 1015 QID 38 timeout, aborting
[Tue Oct  5 16:55:25 2021] nvme nvme6: I/O 15 QID 21 timeout, aborting
[Tue Oct  5 16:55:25 2021] nvme nvme6: I/O 408 QID 37 timeout, aborting
[Tue Oct  5 16:55:29 2021] nvme nvme6: I/O 1013 QID 38 timeout, reset controller
[Tue Oct  5 16:55:59 2021] nvme nvme6: I/O 11 QID 0 timeout, reset controller
[Tue Oct  5 16:56:51 2021] nvme nvme6: Device not ready; aborting reset
[Tue Oct  5 16:56:51 2021] nvme nvme6: Abort status: 0x371
[Tue Oct  5 16:56:51 2021] nvme nvme6: Abort status: 0x371
[Tue Oct  5 16:56:51 2021] nvme nvme6: Abort status: 0x371
[Tue Oct  5 16:56:51 2021] nvme nvme6: Abort status: 0x371
[Tue Oct  5 16:56:51 2021] nvme nvme6: Abort status: 0x371
[Tue Oct  5 16:56:51 2021] nvme nvme6: Abort status: 0x371
[Tue Oct  5 16:57:11 2021] nvme nvme6: Device not ready; aborting reset
[Tue Oct  5 16:57:11 2021] nvme nvme6: Removing after probe failure status: -19
[Tue Oct  5 16:57:32 2021] nvme nvme6: Device not ready; aborting reset
[Tue Oct  5 16:57:32 2021] blk_update_request: I/O error, dev nvme6n1, sector 
842198232 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0

[Mon Oct 11 12:14:38 2021] nvme nvme2: I/O 306 QID 48 timeout, aborting
[Mon Oct 11 12:14:39 2021] nvme nvme2: I/O 827 QID 14 timeout, aborting
[Mon Oct 11 12:15:01 2021] nvme nvme2: I/O 828 QID 14 timeout, aborting
[Mon Oct 11 12:15:05 2021] nvme nvme2: I/O 829 QID 14 timeout, aborting
[Mon Oct 11 12:15:07 2021] nvme nvme2: I/O 830 QID 14 timeout, aborting
[Mon Oct 11 12:15:08 2021] nvme nvme2: I/O 306 QID 48 timeout, reset controller
[Mon Oct 11 12:15:38 2021] nvme nvme2: I/O 20 QID 0 timeout, reset controller
[Mon Oct 11 12:16:29 2021] nvme nvme2: Device not ready; 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-10-02 Thread Andre Ruiz
I'm seeing this in focal kernel 5.4.0-88. Is this expected? Do I have to
switch to the hwe kernel pointed above to fix this?

The laptop has been stable for a long time and then suddenly started
having this exact symptom a few days ago. I'm wondering if this was
introduced in latest ga kernels for focal or if it was always there.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-04-07 Thread stevecam
** Also affects: debian
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released
Status in Debian:
  New

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-26 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.8.0-41.46

---
linux (5.8.0-41.46) groovy; urgency=medium

  * groovy/linux: 5.8.0-41.46 -proposed tracker (LP: #1912219)

  * Groovy update: upstream stable patchset 2020-12-17 (LP: #1908555) // nvme
drive fails after some time (LP: #1910866)
- Revert "nvme-pci: remove last_sq_tail"

  * initramfs unpacking failed (LP: #1835660)
- SAUCE: lib/decompress_unlz4.c: correctly handle zero-padding around 
initrds.

  * overlay: permission regression in 5.4.0-51.56 due to patches related to
CVE-2020-16120 (LP: #1900141)
- ovl: do not fail because of O_NOATIME

 -- Kleber Sacilotto de Souza   Mon, 18 Jan
2021 17:01:08 +0100

** Changed in: linux (Ubuntu Groovy)
   Status: Fix Committed => Fix Released

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2020-16120

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Released

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-25 Thread Kelsey Skunberg
@Andrew, thank you for testing! I'm switching verification status to
'verification-done-groovy'.

** Tags removed: verification-needed-groovy
** Tags added: verification-done-groovy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-21 Thread Andrew Hayzen
@Kleber I have installed the focal hwe kernel from proposed (as seen
below). So far when A/B testing this kernel it is working correctly :-)
I will continue running this kernel and report any issues I have.

Also note that I have been continuously running the test kernel (from
comment 22) since last week and it has worked perfectly so far :-)

I look forward to this migrating from -proposed into focal.

$ uname -a
Linux xps-13-9360 5.8.0-41-generic #46~20.04.1-Ubuntu SMP Mon Jan 18 17:52:23 
UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ apt policy linux-generic-hwe-20.04
linux-generic-hwe-20.04:
  Installed: 5.8.0.41.46~20.04.27
  Candidate: 5.8.0.41.46~20.04.27
  Version table:
 *** 5.8.0.41.46~20.04.27 500
500 http://gb.archive.ubuntu.com/ubuntu focal-proposed/main amd64 
Packages
100 /var/lib/dpkg/status
 5.8.0.40.45~20.04.25 500
500 http://gb.archive.ubuntu.com/ubuntu focal-updates/main amd64 
Packages
 5.8.0.38.43~20.04.23 500
500 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages
 5.4.0.26.32 500
500 http://gb.archive.ubuntu.com/ubuntu focal/main amd64 Packages

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-21 Thread Kleber Sacilotto de Souza
Hello Alan or anyone else affected,

The fix for this bug is also available on the hwe kernel for Focal
currently in -proposed (version 5.8.0-41.46~20.04.1). Feedback whether
this kernel fixes the nvme issue would be appreciated.

Thank you.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-20 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
groovy' to 'verification-done-groovy'. If the problem still exists,
change the tag 'verification-needed-groovy' to 'verification-failed-
groovy'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-groovy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-19 Thread Kleber Sacilotto de Souza
Thank you Andrew for your feedback!

We have applied the fix for groovy/linux (and focal/linux-hwe-5.8) and
the new kernels will be available in -proposed soon. These packages are
planned to be promoted to -updates early next week.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-18 Thread Kleber Sacilotto de Souza
** Changed in: linux (Ubuntu Groovy)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  Fix Committed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-18 Thread Kleber Sacilotto de Souza
** Also affects: linux (Ubuntu Groovy)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Groovy)
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Groovy:
  In Progress

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-15 Thread Andrew Hayzen
@Marcelo So far it looks good :-) It passes the "fio" command test when
A/B testing between a known bad kernel and this new kernel. I will
continue running it on this machine over the weekend to ensure longer
usage doesn't have any remaining issues - but looks like it resolves the
issue so far :-D Thanks!

$ uname -a
Linux xps-13-9360 5.8.0-38-generic #43+lp1910866 SMP Fri Jan 15 20:29:27 UTC 
2021 x86_64 x86_64 x86_64 GNU/Linux

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-15 Thread Andrew Hayzen
Thanks! I'll take a look :-)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   linux-backports-modules-5.8.0-34-generic  N/A
   linux-firmware1.190.2
  SourcePackage: linux
  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-15 Thread Marcelo Cerri
Hi, Andrew.

I created a test kernel with the fix and it is available at:

https://kernel.ubuntu.com/~mhcerri/lp1910866_linux-5.8.0-38-generic_5.8.0-38.43+lp1910866_amd64.tar.gz

You can install it by extracting the tarball and installing the debian
packages:

$ tar xf lp1910866_linux-5.8.0-38-generic_5.8.0-38.43+lp1910866_amd64.tar.gz
$ sudo apt install ./*.deb

Please let us know if the test kernel solves the problem.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-15 Thread Terry Rudd
Andrew, we plan to address this in the Focal 5.8 hwe kernel and we're
going to be building a test kernel.  We would really appreciate you
testing it since you have a reliable reproducer.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-15 Thread Andrew Hayzen
@kaihengfeng Thanks for the quick response!  bug 1908555 linked there
only lists groovy as a target series, I hope that this will also be
applied to the focal HWE kernel :-)

Also I am happy to test any kernel in a -proposed channel or PPA to
confirm it fixes the issue if that helps :-)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-15 Thread Kai-Heng Feng
OK, the fix will be in next 5.8 update:
commit f62ddacc4cb141b86ed647f9dd9eeb7653b0cc43
Author: Keith Busch 
Date:   Fri Oct 30 10:28:54 2020 -0700

Revert "nvme-pci: remove last_sq_tail"

BugLink: https://bugs.launchpad.net/bugs/1908555

[ Upstream commit 38210800bf66d7302da1bb5b624ad68638da1562 ]

Multiple CPUs may be mapped to the same hctx, allowing mulitple
submission contexts to attempt commit_rqs(). We need to verify we're
not writing the same doorbell value multiple times since that's a spec
violation.

Revert commit 54b2fcee1db041a83b52b51752dade6090cf952f.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1878596
Reported-by: "B.L. Jones" 
Signed-off-by: Keith Busch 
Signed-off-by: Sasha Levin 
Signed-off-by: Kamal Mostafa 
Signed-off-by: Ian May 


** Bug watch added: Red Hat Bugzilla #1878596
   https://bugzilla.redhat.com/show_bug.cgi?id=1878596

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-15 Thread Andrew Hayzen
@kaihengfeng

So v5.7 was fine and after many reboots it has been found that this
commit below introduced the issue.

Do I also need to find when the issue was resolved ? (between v5.8-rc1
and v5.9.10) or is this information enough ?


54b2fcee1db041a83b52b51752dade6090cf952f is the first bad commit
commit 54b2fcee1db041a83b52b51752dade6090cf952f
Author: Keith Busch 
Date:   Mon Apr 27 11:54:46 2020 -0700

nvme-pci: remove last_sq_tail

The nvme driver does not have enough tags to wrap the queue, and blk-mq
will no longer call commit_rqs() when there are no new submissions to
notify.

Signed-off-by: Keith Busch 
Reviewed-by: Sagi Grimberg 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Jens Axboe 

 drivers/nvme/host/pci.c | 23 ---
 1 file changed, 4 insertions(+), 19 deletions(-)


And my $ git bisect log is the following FWIW.
git bisect start
# good: [3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162] Linux 5.7
git bisect good 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162
# bad: [b3a9e3b9622ae10064826dccb4f7a52bd88c7407] Linux 5.8-rc1
git bisect bad b3a9e3b9622ae10064826dccb4f7a52bd88c7407
# bad: [ee01c4d72adffb7d424535adf630f2955748fa8b] Merge branch 'akpm' (patches 
from Andrew)
git bisect bad ee01c4d72adffb7d424535adf630f2955748fa8b
# bad: [16d91548d1057691979de4686693f0ff92f46000] Merge tag 'xfs-5.8-merge-8' 
of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect bad 16d91548d1057691979de4686693f0ff92f46000
# good: [cfa3b8068b09f25037146bfd5eed041b78878bee] Merge tag 'for-linus-hmm' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
git bisect good cfa3b8068b09f25037146bfd5eed041b78878bee
# good: [3fd911b69b3117e03181262fc19ae6c3ef6962ce] Merge tag 
'drm-misc-next-2020-05-07' of git://anongit.freedesktop.org/drm/drm-misc into 
drm-next
git bisect good 3fd911b69b3117e03181262fc19ae6c3ef6962ce
# good: [1966391fa576e1fb2701be8bcca197d8f72737b7] mm/migrate.c: 
attach_page_private already does the get_page
git bisect good 1966391fa576e1fb2701be8bcca197d8f72737b7
# bad: [0c8d3fceade2ab1bbac68bca013e62bfdb851d19] bcache: configure the 
asynchronous registertion to be experimental
git bisect bad 0c8d3fceade2ab1bbac68bca013e62bfdb851d19
# bad: [84b8d0d7aa159652dc191d58c4d353b6c9173c54] nvmet: use type-name map for 
ana states
git bisect bad 84b8d0d7aa159652dc191d58c4d353b6c9173c54
# good: [72e6329f86c714785ac195d293cb19dd24507880] nvme-fc and nvmet-fc: revise 
LLDD api for LS reception and LS request
git bisect good 72e6329f86c714785ac195d293cb19dd24507880
# good: [e4fcc72c1a420bdbe425530dd19724214ceb44ec] nvmet-fc: slight cleanup for 
kbuild test warnings
git bisect good e4fcc72c1a420bdbe425530dd19724214ceb44ec
# good: [31fdad7be18992606078caed6ff71741fa76310a] nvme: consolodate io settings
git bisect good 31fdad7be18992606078caed6ff71741fa76310a
# bad: [2a5bcfdd41d68559567cec3c124a75e093506cc1] nvme-pci: align io queue 
count with allocted nvme_queue in nvme_probe
git bisect bad 2a5bcfdd41d68559567cec3c124a75e093506cc1
# good: [6623c5b3dfa5513190d729a8516db7a5163ec7de] nvme: clean up error 
handling in nvme_init_ns_head
git bisect good 6623c5b3dfa5513190d729a8516db7a5163ec7de
# good: [74943d45eef4db64b1e5c9f7ad1d018576e113c5] nvme-pci: remove volatile 
cqes
git bisect good 74943d45eef4db64b1e5c9f7ad1d018576e113c5
# bad: [54b2fcee1db041a83b52b51752dade6090cf952f] nvme-pci: remove last_sq_tail
git bisect bad 54b2fcee1db041a83b52b51752dade6090cf952f
# first bad commit: [54b2fcee1db041a83b52b51752dade6090cf952f] nvme-pci: remove 
last_sq_tail

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 
  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-12 Thread Kai-Heng Feng
Thanks a lot!
Can you please test v5.7? Stable release (point release) isn't linear with 
mainline kernel.

Once you are sure v5.7 is good, we can start a bisect:
$ sudo apt build-dep linux
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ git bisect start
$ git bisect good v5.7
$ git bisect bad v5.8-rc1
$ make localmodconfig
$ make -j`nproc` deb-pkg
Install the newly built kernel, then reboot with it.
If it still have the same issue,
$ git bisect bad
Otherwise,
$ git bisect good
Repeat to "make -j`nproc` deb-pkg" until you find the offending commit.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-12 Thread Andrew Hayzen
And the bisect between 5.4.78 (good) and 5.8.18 (bad).

The following results with the mainline kernel
v5.8.18/FAIL
v5.8.4/ FAIL
v5.8-rc5/   FAIL
v5.8-rc1/   FAIL
v5.7.19/PASS
v5.7.18/PASS
v5.7.16/PASS
v5.6.14/PASS
v5.4.78/PASS

>From these and the previous comment's results it appears that the issue
was introduced with 5.8-rc1 and then was fixed with 5.9.9 or 5.9.10.
(it is unfortunate that 5.9.9 is missing so I cannot try it).

@kaihengfeng let me know if there is any other information I can
provide.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-12 Thread Andrew Hayzen
So bisecting between 5.8.18 (bad) and 5.11-rc3 (good).

The following results with the mainline kernel
v5.11-rc3/  PASS
v5.9.12/PASS
v5.9.10/PASS
v5.9.9/ MISSING
v5.9.8/ FAIL (could not boot long enough for full test)
v5.9.7/ FAIL (could not boot long enough for full test)
v5.9.2/ FAIL (could not boot long enough for full test)
v5.8.18/FAIL

Note that 5.9.2, 5.9.7, 5.9.8 all crashed during either boot or logging
in (but after performing REISUB they all entered the Dell BIOS/recovery
stating that the hard disk could not be found, so I assume this is the
same failure).

>From these results it appears that between 5.9.8 and 5.9.10 it was
fixed.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-12 Thread Andrew Hayzen
OK, so using https://people.canonical.com/~kernel/info/kernel-version-
map.html that states that Ubuntu kernel 5.8.0-36.40~20.04.1 matches
mainline version 5.8.18. I have installed 5.8.18 and it fails ! So it is
not the Ubuntu patches.

Ubuntu Kernels:
linux-image-5.4.0-59-generic: PASS
linux-image-5.8.0-36-generic: FAIL

Mainline Kernels:
linux-image-unsigned-5.8.18-050818-generic: FAIL
linux-image-unsigned-5.11.0-051100rc3-generic: PASS

I'll see if I can find where it changes from FAIL to PASS between 5.8.18
in the mainline kernels. Please advise if should also/instead compare
between 5.4 and 5.8.18 :-)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-12 Thread Andrew Hayzen
@kaihengfeng

I have found that running the command "fio --name=basic
--directory=/path/to/empty/directory --size=1G --rw=randrw --numjobs=4
--loops=5" runs fine on linux-image-5.4.0-59-generic but when trying
with linux-image-5.8.0-36-generic it would freeze the system in the
"Laying out IO file" stage. I checked with two subsequent boots that the
5.8 does fail like this on an empty directory and will now use this as
my "test" if a kernel works or not.

I have installed the 5.11 rc3 mainline kernel you linked, note I have
had to disable secure boot to be able to use it. But this kernel worked
successfully on two boots with the fio test above.

So in summary so far on my system with the fio test:
linux-image-5.4.0-59-generic: PASS
linux-image-5.8.0-36-generic: FAIL
linux-image-unsigned-5.11.0-051100rc3-generic: PASS

Please advise how to proceed here, should I start manually picking (by
bisecting) kernels between 5.8 and 5.11 or between 5.4 and 5.8 ?

Also I guess I should also try 5.8 mainline to ensure that any Ubuntu
patches aren't causing an issue?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-11 Thread Kai-Heng Feng
Andrew, since you can reliably reproduce the issue, can you please test latest 
mainline kernel:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11-rc3/amd64/

And we'll do a bisect or reverse-bisect based on the result.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-11 Thread Andrew Hayzen
FYI I have captured the `sudo lspci -vv` output on the kernel 5.8
*before* the issue here https://pastebin.ubuntu.com/p/GtZyTWzKTd/ it is
subtly different to the 5.4 kernel (which has not had the issue) in case
that mattered.

I was also able to reproduce the issue again by causing high disk I/O,
specifically I needed to have writes occurring for it to happen (I was
recursive grep'ing the whole filesystem while installing apt/pip
packages inside a docker container).

This then froze the system for 120 seconds until write timeouts
occurred, then the disk was remounted as read-only. After this point
commands on the system would fail with I/O errors (even basic ones such
as "top", although some such as "mount" still work).

However our plan was to try to retrieve more information by copying the
lspci binary and libs into a tmpfs system in RAM, so it'd still be
accessible when the disk stopped. This almost worked, but it appears a
few more configuration files would need to be placed in RAM (I could run
"lspci --help" but not "lspci" or "lspci -vv"). Instead popey has
suggested maybe using a USB key with debootstrap/chroot. (Any
suggestions of how we can retrieve more information at this point are
welcome and any commands that would be useful to run).

Also as a note, if I use REISUB (
https://en.m.wikipedia.org/wiki/Magic_SysRq_key#Uses ) to reboot the
machine it enters a Dell BIOS/recovery thing that states that "No Hard
Disk is found". Then after a full power off the machine works again.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-11 Thread Alan Pope  濾
I've tried doing various IO intensive things to trigger it but no luck
yet.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   linux-backports-modules-5.8.0-34-generic  N/A
   linux-firmware

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-11 Thread Andrew Hayzen
Note for me it is happening quite rapidly (sometimes after 5-10 minutes)
of high disk load. Eg the first times it happened when apt was running
update-grub and then when pip3 install was running. Then to capture the
logs above i started a `find /` and `find ~` at the same time and this
was enough to break it.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-11 Thread Alan Pope  濾
I can try, but I can't trigger it to happen. Given I had 60 days uptime
on my system before it happened last time, and 12 days the time before
that. That gives you some idea of the interval between it happening.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-11 Thread Andrew Hayzen
@kairhengfeng  Yes this is a regression after the upgrade from 5.4 to
5.8. After the upgrade I had it multiple times and now I have switched
back to 5.4 my machine is stable again.

I do not think I can run `lspci -vv` *after* the issue happens, as my
NVMe drive goes read-only, so all commands fail.

This is the output of `sudo lspci -vv` on the kernel 5.4 and *before* it
happens https://pastebin.ubuntu.com/p/tCshwbhpqs/  Let me know if also
running this on 5.8 *before* it happens could be useful or not.

@popey are you able to run this command before and after it happens with
your dual disk system ?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-10 Thread Kai-Heng Feng
Is this a regresison? Did it start to happen after upgrade from 5.4 to
5.8?

And is it possible to attach `lspci -vv` after the issue happen?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-09 Thread Alan Pope  濾
It's the TOSHIBA-RD400 on /home for me that's failing.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   linux-backports-modules-5.8.0-34-generic  N/A
   linux-firmware1.190.2
  

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-09 Thread Andrew Hayzen
I'm on Ubuntu 20.04, and after updating to the HWE 5.8 kernel recently I
have also been suffering my nvme drive becoming read only after a period
of time. I have now switched back to the 5.4 kernel and not suffered the
issue again.

I am on a single disk system so had to run dmesg --follow remotely on
another machine to retrieve log information.

Here is a pastebin of around the time my system locks up
https://pastebin.ubuntu.com/p/FKsJV8VwRw/ (note it has similar errors, a
timeout aborting, then a reset, then i have a call trace etc).

Here is a pastebin of the smartctl output
https://pastebin.ubuntu.com/p/W9w2nHYhd2/ the drive itself appears to be
fine and not failing (it does seem to increment "Error Information Log
Entries" when this lockup happens - but when viewing the error it is
just full of 0x).


System info when the lockup happened:

Machine: Dell XPS 13 9360
Drive: THNSN5512GPUK NVMe TOSHIBA 512GB
Kernel at the time: $ apt policy linux-image-generic-hwe-20.04
linux-image-generic-hwe-20.04:
  Installed: 5.8.0.36.40~20.04.21
  Candidate: 5.8.0.36.40~20.04.21
  Version table:
 *** 5.8.0.36.40~20.04.21 500
500 http://gb.archive.ubuntu.com/ubuntu focal-updates/main amd64 
Packages
500 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages
100 /var/lib/dpkg/status
 5.4.0.26.32 500
500 http://gb.archive.ubuntu.com/ubuntu focal/main amd64 Packages

Let me know if I can provide any more info :-)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev 

[Kernel-packages] [Bug 1910866] Re: nvme drive fails after some time

2021-01-09 Thread Kai-Heng Feng
Which one is the failing one? Samsung or OCZ?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Sorry for the vague title. I thought this was a hardware issue until
  someone else online mentioned their nvme drive goes "read only" after
  some time. I tend not to reboot my system much, so have a large
  journal. Either way this happens once in a while. The / drive is fine,
  but /home is on nvme which just disappears. I reboot and everything is
  fine. But leave it long enough and it'll fail again.

  Here's the most recent snippet about the nvme drive before I restarted
  the system.

  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting 

   
  Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting 

   
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset 
controller
  Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset 
controller
  Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
  Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure 
status: -19
  Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more 
than 120 seconds.
  Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D0   731  2 0x4000
  Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, 
CSTS=0x1
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123967, lost async page write
  Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): 
__ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading 
directory lblock 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123917, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 240123909, lost async page write
  Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, 
sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical 
block 0, lost sync page write
  Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing 
superblock

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: linux-image-5.8.0-34-generic 5.8.0-34.37
  ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
  Uname: Linux 5.8.0-34-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu50.3
  Architecture: amd64
  CasperMD5CheckResult: skip
  CurrentDesktop: ubuntu:GNOME
  Date: Sat Jan  9 11:56:28 2021
  InstallationDate: Installed on 2020-08-15 (146 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: Intel Corporation NUC8i7HVK
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic 
root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
  RebootRequiredPkgs:
   linux-image-5.8.0-36-generic
   linux-base
  RelatedPackageVersions:
   linux-restricted-modules-5.8.0-34-generic N/A
   linux-backports-modules-5.8.0-34-generic  N/A
   linux-firmware1.190.2
  SourcePackage: