Public bug reported:

Bug Description:
This problem seems related to the following reported bugs:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1737934
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
https://bugs.launchpad.net/ubuntu/+source/linux-signed/+bug/1682704
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1705748

System Configuration:
The server is populated with hot swappable 1.9TB 7mm U.2 NVMe and 1.9TB 2.5” 
SSD SATA – either drives can serve as system boot drive. In addition, there’re 
twelve (12) 7TB 3.5” HDD SAS drives for data raid storage. Currently, the SSD 
SATA drive is used as file system boot device. Below is the lsb_release and cat 
/proc/cmdline output.

Distributor ID: Ubuntu
Description:    Ubuntu 18.04.1 LTS
Release:        18.04
Codename:       bionic

BOOT_IMAGE=/boot/vmlinuz-4.15.0-43-generic
root=UUID=20881db9-36c3-4d83-9688-da36477e62e3 ro


Problem:
For both Xenial and Bionic running with kernel 4.15, the NVMe drive is having 
severe problems – the drive is intermittently disappearing with lsblk and nvme 
list commands, while spewing a lot of Buffer I/O errors on screen. The lspci 
output shows that the controller is disconnecting intermittently on the PCIe 
bus.         

Symptom:
At initial system’s boot up from fresh install, the NVMe drive will be 
recognized by lsblk, lspci and nvme list commands. Then the drive will go 
missing for no obvious reason then will appear again randomly, and this restart 
events will cycle continuously even the drive is at idle state. The output of 
dmesg, syslog and journal are all the same – it displays the unexpected 
shutdown of the drive. See below output. (for detailed text, see the attached 
dmesg, syslog and journalctl files)
-------------------------
[  485.296413] nvme nvme1: pci function 0000:18:00.0
[  485.296516] nvme 0000:18:00.0: enabling device (0100 -> 0102)
[  485.406992] nvme nvme1: Shutdown timeout set to 8 seconds
[  485.409433] nvme nvme1: failed to mark controller state 1
[  485.409435] nvme nvme1: Removing after probe failure status: 0
-------------------------
This symptom is present on both distro 16.04LTS and 18.04LTS running at kernel 
4.15. The same symptom was evident at upstream kernels - 4.18.3 and 4.18.19 as 
well.  

Work-around:
With kernel 4.19, this symptom does not manifest. It has fixed the random 
disconnection of NVMe controller over the PCIe bus. In addition, all the 
commands - i.e. lspci, lsblk, and nvme list will consistently display the drive 
without the occurrence of a missing NVMe drive during scan. 

How to reproduce:
1.Use Samsung U.2 NVME SM961/PM961 drive
2.Install either Ubuntu 16.04.5 or 18.04.1 running at kernel 4.15
3.Boot and wait until the file system recognized all the installed drives in 
the system. 
4.Execute lsblk, lspci and nvme list command repeatedly. You will see, the NVMe 
drive will randomly go missing and will spew-out a lot of buffer IO errors. 
With 16.04.5LTS, the symptom is more severe. After fresh install, the drive 
will appear once, but once it goes missing, it will never come back again.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Incomplete


** Tags: bionic

** Attachment added: "dmesg.txt"
   https://bugs.launchpad.net/bugs/1810548/+attachment/5226958/+files/dmesg.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1810548

Title:
  Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel
  4.15

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Bug Description:
  This problem seems related to the following reported bugs:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1737934
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
  https://bugs.launchpad.net/ubuntu/+source/linux-signed/+bug/1682704
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1705748

  System Configuration:
  The server is populated with hot swappable 1.9TB 7mm U.2 NVMe and 1.9TB 2.5” 
SSD SATA – either drives can serve as system boot drive. In addition, there’re 
twelve (12) 7TB 3.5” HDD SAS drives for data raid storage. Currently, the SSD 
SATA drive is used as file system boot device. Below is the lsb_release and cat 
/proc/cmdline output.

  Distributor ID:       Ubuntu
  Description:  Ubuntu 18.04.1 LTS
  Release:      18.04
  Codename:     bionic

  BOOT_IMAGE=/boot/vmlinuz-4.15.0-43-generic
  root=UUID=20881db9-36c3-4d83-9688-da36477e62e3 ro

  
  Problem:
  For both Xenial and Bionic running with kernel 4.15, the NVMe drive is having 
severe problems – the drive is intermittently disappearing with lsblk and nvme 
list commands, while spewing a lot of Buffer I/O errors on screen. The lspci 
output shows that the controller is disconnecting intermittently on the PCIe 
bus.         

  Symptom:
  At initial system’s boot up from fresh install, the NVMe drive will be 
recognized by lsblk, lspci and nvme list commands. Then the drive will go 
missing for no obvious reason then will appear again randomly, and this restart 
events will cycle continuously even the drive is at idle state. The output of 
dmesg, syslog and journal are all the same – it displays the unexpected 
shutdown of the drive. See below output. (for detailed text, see the attached 
dmesg, syslog and journalctl files)
  -------------------------
  [  485.296413] nvme nvme1: pci function 0000:18:00.0
  [  485.296516] nvme 0000:18:00.0: enabling device (0100 -> 0102)
  [  485.406992] nvme nvme1: Shutdown timeout set to 8 seconds
  [  485.409433] nvme nvme1: failed to mark controller state 1
  [  485.409435] nvme nvme1: Removing after probe failure status: 0
  -------------------------
  This symptom is present on both distro 16.04LTS and 18.04LTS running at 
kernel 4.15. The same symptom was evident at upstream kernels - 4.18.3 and 
4.18.19 as well.  

  Work-around:
  With kernel 4.19, this symptom does not manifest. It has fixed the random 
disconnection of NVMe controller over the PCIe bus. In addition, all the 
commands - i.e. lspci, lsblk, and nvme list will consistently display the drive 
without the occurrence of a missing NVMe drive during scan. 

  How to reproduce:
  1.Use Samsung U.2 NVME SM961/PM961 drive
  2.Install either Ubuntu 16.04.5 or 18.04.1 running at kernel 4.15
  3.Boot and wait until the file system recognized all the installed drives in 
the system. 
  4.Execute lsblk, lspci and nvme list command repeatedly. You will see, the 
NVMe drive will randomly go missing and will spew-out a lot of buffer IO 
errors. With 16.04.5LTS, the symptom is more severe. After fresh install, the 
drive will appear once, but once it goes missing, it will never come back again.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810548/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to