I'm investigating this issue, and built a kernel with the following two
patches:

a) 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7776db1ccc1
 
b) A debug patch present in 
http://lists.infradead.org/pipermail/linux-nvme/2017-February/008498.html 

The idea of the first patch, which was merged upstream in Linux 4.12, is to 
poll the completion 
queue of the device in the event of a timeout - if it succeeds, means that the 
device didn't post a completion, so could be an adapter issue. 

The idea of the 2nd patch is just to provide debug information in case of a 
mismatch in the choice 
of the blk-mq hw queue in nvme driver - it's a debug patch proposed in the 
mailing list to address a similar bug report in the past. 

The kernel with the debug patches is available in PPA - to install it,
one can follow the below instructions:

a) sudo add-apt-repository ppa:gpiccoli/test-nvme-182638 
b) sudo apt-get update 
c) sudo apt-get install linux-image-4.4.0-1073-aws 

After installation is complete, please reboot the instance and after it's 
restarted, 
check "uname -rv" output, which should be: 

"4.4.0-1073-aws #83+hf182638v20181129b1-Ubuntu SMP Fri Nov 30 17:09:30
UTC 2018"

Please notice this is a test kernel, shouldn't be used in any production 
environment, nor is 
officially supported in any form.

Anybody that can test this, much appreciated. Please post the complete dmesg 
after/if the issue is triggered.
Thanks,


Guilherme

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1788035

Title:
  nvme: avoid cqe corruption

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1788035/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to