I'm investigating this issue, and built a kernel with the following two patches:
a) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7776db1ccc1 b) A debug patch present in http://lists.infradead.org/pipermail/linux-nvme/2017-February/008498.html The idea of the first patch, which was merged upstream in Linux 4.12, is to poll the completion queue of the device in the event of a timeout - if it succeeds, means that the device didn't post a completion, so could be an adapter issue. The idea of the 2nd patch is just to provide debug information in case of a mismatch in the choice of the blk-mq hw queue in nvme driver - it's a debug patch proposed in the mailing list to address a similar bug report in the past. The kernel with the debug patches is available in PPA - to install it, one can follow the below instructions: a) sudo add-apt-repository ppa:gpiccoli/test-nvme-182638 b) sudo apt-get update c) sudo apt-get install linux-image-4.4.0-1073-aws After installation is complete, please reboot the instance and after it's restarted, check "uname -rv" output, which should be: "4.4.0-1073-aws #83+hf182638v20181129b1-Ubuntu SMP Fri Nov 30 17:09:30 UTC 2018" Please notice this is a test kernel, shouldn't be used in any production environment, nor is officially supported in any form. Anybody that can test this, much appreciated. Please post the complete dmesg after/if the issue is triggered. Thanks, Guilherme -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1788035 Title: nvme: avoid cqe corruption To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1788035/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
