On Sep 21 09:50, Keith Busch wrote: > On Fri, Sep 18, 2020 at 10:36:07PM +0200, Klaus Jensen wrote: > > @@ -466,15 +476,21 @@ static void nvme_post_cqes(void *opaque) > > break; > > } > > > > - QTAILQ_REMOVE(&cq->req_list, req, entry); > > sq = req->sq; > > req->cqe.status = cpu_to_le16((req->status << 1) | cq->phase); > > req->cqe.sq_id = cpu_to_le16(sq->sqid); > > req->cqe.sq_head = cpu_to_le16(sq->head); > > addr = cq->dma_addr + cq->tail * n->cqe_size; > > + ret = pci_dma_write(&n->parent_obj, addr, (void *)&req->cqe, > > + sizeof(req->cqe)); > > + if (ret) { > > + trace_pci_nvme_err_addr_write(addr); > > + timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + > > + 500 * SCALE_MS); > > + break; > > + } > > + QTAILQ_REMOVE(&cq->req_list, req, entry); > > Is this error really ever transient such that a retry later may be > successful? I didn't find a way that appeared that it could be. If not, > this probably deserves a CFS condition rather than a retry. >
I'll admit that I did this patch with the blktests test case in mind, and that retrying causes the test to pass. But I think you are right that retrying might not be the right way to "pass the test". I tested and setting CFS also passes the test and now actually exercises the reset path in the kernel. So wins all around. I am thinking we do the same for a failed read in nvme_process_sq then?
signature.asc
Description: PGP signature