Date: Mon, 19 Sep 2016 01:36:14 +0000 From: David Holland <dholland-sourcechan...@netbsd.org>
On Sun, Sep 18, 2016 at 09:52:37PM +0000, Jaromir Dolecek wrote: > Modified Files: > src/sys/dev/ic: ld_nvme.c > > Log Message: > must use PR_NOWAIT also during ldattach()/dkwedge discover, our > i/o is there called with a spin lock held, which triggers > LOCKDEBUG panic That sounds like it should be corrected upstream of nvme...? Here's the relevant call tree: ld_diskstart acquires sc->sc_mutex (an IPL_VM spin lock) calls sc->sc_start = ld_nvme_start calls ld_nvme_dobio calls nvme_ns_get_ctx with PR_WAITOK or PR_NOWAIT releases sc->sc_mutex In this call tree, it is *never* correct to use PR_WAITOK, because the spin lock is unconditionally held. And the only other caller of ld_nvme_dobio is ld_nvme_dump, which is probably not allowed to block either for other reasons -- which presumably led to the abuse of cpu_(soft)intr_p here. Instead, you should prime nvme_ns_ctx_cache up front, e.g. by calling pool_cache_setlowat in ld_nvme_attach (and maintaining a count of the nvme instances), and always use PR_NOWAIT. Also, you need to handle failure in nvme_ns_get_ctx. From a cursory examination of dk_start, it looks like returning EAGAIN will DTRT -- dk_start will retry the block later. (Incidentally, it looks like nvme is statically wired to use a maximum queue depth of 128. In that case, you could just use a fixed array of 128 entries -- that's just a few pages of RAM. But maybe you want to be able to change the queue depth later.)