> Is there any update on this issue?

I'm still debugging, but can give a summary of the problem, at least on
the system I'm working on.

First, to address:
> One is the vfree WARNING which indicates that the error paths are not quite 
> right

this is a trival fix to the error handling path; it's fixed upstream by
commit f58944e265d4ebe47216a5d7488aee3928823d30 ('NVMe: Simplify device
reset failure'), or alternately can be fixed more simply by removing the
unmap call from the error path, since the later removal also unmaps
(which is where the error comes from, unmapping already-unmapped
memory).  I'll SRU a fix for this, but of course this is just a minor
annoyance since the real problem is why the device timed out during
probing.

> why the NVMe drive failed to function correctly - which is the primary
issue for this test case

the problem here, in debugging on the system i have access to at least,
is the nvme driver isn't able to obtain the interrupt for the nvme
controller admin queue, and so the requests send to the admin queue (to
configure the controller) are never completed, and eventually time out,
leading the aborting the device probing.

The reason this used to work (e.g. with stock 4.4.x) is the nvme driver
used to use a kthread to poll the admin queue and complete any work
directly.  The kthread polling was removed upstream with commit
79f2b358c9ba373943a9284be2861fde58291c4e ('nvme: don't poll the CQ from
the kthread') which was pulled into the xenial kernel at 4.4.0-35 - but,
probe-time kthread polling was removed by upstream commit
7385014c073263b077442439299fad013edd4409 ('nvme: only add a controller
to dev_list after it's been fully initialized') which came into xenial
very early at 4.4.0-3.  So if the irq doesn't work, the nvme probing
times out.

As to why the nvme driver can't obtain the irq, I can't determine yet,
which is what i'm still debugging.  If you have a test system I can
access, I can debug there as well, or at least provide a debug kernel to
try (but i'm still adding more debug).  I'm not working on a ppc system,
so it's possible something else is going on there.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1639920

Title:
  NVMe detection failed during bootup

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1639920/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to