On Thu, Feb 16, 2023 at 04:27:42PM +0100, Kevin Wolf wrote: > Am 10.02.2023 um 15:32 hat Stefan Hajnoczi geschrieben: > > dma_blk_cb() only takes the AioContext lock around ->io_func(). That > > means the rest of dma_blk_cb() is not protected. In particular, the > > DMAAIOCB field accesses happen outside the lock. > > > > There is a race when the main loop thread holds the AioContext lock and > > invokes scsi_device_purge_requests() -> bdrv_aio_cancel() -> > > dma_aio_cancel() while an IOThread executes dma_blk_cb(). The dbs->acb > > field determines how cancellation proceeds. If dma_aio_cancel() see > > dbs->acb == NULL while dma_blk_cb() is still running, the request can be > > completed twice (-ECANCELED and the actual return value). > > > > The following assertion can occur with virtio-scsi when an IOThread is > > used: > > > > ../hw/scsi/scsi-disk.c:368: scsi_dma_complete: Assertion `r->req.aiocb != > > NULL' failed. > > > > Fix the race by holding the AioContext across dma_blk_cb(). Now > > dma_aio_cancel() under the AioContext lock will not see > > inconsistent/intermediate states. > > > > Cc: Paolo Bonzini <pbonz...@redhat.com> > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > > Two things that seem to need attention in the review: > > > diff --git a/softmmu/dma-helpers.c b/softmmu/dma-helpers.c > > index 7820fec54c..2463964805 100644 > > --- a/softmmu/dma-helpers.c > > +++ b/softmmu/dma-helpers.c > > @@ -113,17 +113,19 @@ static void dma_complete(DMAAIOCB *dbs, int ret) > > static void dma_blk_cb(void *opaque, int ret) > > { > > DMAAIOCB *dbs = (DMAAIOCB *)opaque; > > + AioContext *ctx = dbs->ctx; > > dma_addr_t cur_addr, cur_len; > > void *mem; > > > > trace_dma_blk_cb(dbs, ret); > > > > + aio_context_acquire(ctx); > > During the first iteration, the caller may already hold the AioContext > lock. In the case of scsi-disk, it does. Locking a second time is fine > in principle because it's a recursive lock, but we have to be careful > not to call AIO_WAIT_WHILE() indirectly then because it could deadlock. > > Except for the dbs->common.cb (see below) I don't see any functions that > would be problematic in this respect. In fact, the one I would be most > worried about is dbs->io_func, but it was already locked before. > > > dbs->acb = NULL; > > dbs->offset += dbs->iov.size; > > > > if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) { > > dma_complete(dbs, ret); > > All request callbacks hold the AioContext lock now when they didn't > before. I wonder if it's worth documenting the locking rules for > dma_blk_io() in a comment. Could be a separate patch, though. > > You remove the locking in scsi_dma_complete() to compensate. All the > other callers come from IDE and nvme, which don't take the lock > internally. Taking the main AioContext lock once is fine, so this looks > good. > > If it is possible that we already complete on the first iteration, this > could however also be affected by the case above so that the AioContext > is locked twice. In this case, the invoked callback must not call > AIO_WAIT_WHILE() and we would need to audit IDE and nvme. > > Is it possible? In other words, can dbs->sg->nsg be 0? If not, can we > assert and document it?
In the nsg == 0 case there's another problem: the completion callback function is invoked and the AIOCB is unref'd before dma_blk_io() returns the stale AIOCB pointer. That would lead to problems later because the pattern is typically: r->aiocb = dma_blk_io(...); ... use r and r->aiocb later So I don't think nsg = 0 makes sense. Functions I looked at invoke dma_blk_io() only when there are still bytes to transfer. I think we're safe but I'll admit I'm not 100% sure. > > - return; > > + goto out; > > } > > dma_blk_unmap(dbs); > > > > @@ -164,9 +166,9 @@ static void dma_blk_cb(void *opaque, int ret) > > > > if (dbs->iov.size == 0) { > > trace_dma_map_wait(dbs); > > - dbs->bh = aio_bh_new(dbs->ctx, reschedule_dma, dbs); > > + dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs); > > Does copying dbs->ctx to a local variable imply that it may change > during the function? I didn't think so, but if it may, then why is > calling aio_bh_new() with the old value right? I changed this line for consistency, not to change behavior or fix a bug. Regarding AioContext changes, they can't happen because no function that changes the AioContext is called between this line and the earlier aio_context_acquire(). (Having to worry about AioContext changes is a pain though and I look forward to when we can get rid of this lock.) Stefan
signature.asc
Description: PGP signature