On Tue, Jan 30, 2018 at 05:54:56PM +0100, Kevin Wolf wrote: > Am 30.01.2018 um 16:38 hat Stefan Hajnoczi geschrieben: > > Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads > > before main() quits") introduced iothread_stop_all() to avoid the > > following virtio-scsi assertion failure: > > > > assert(blk_get_aio_context(d->conf.blk) == s->ctx); > > > > Back then the assertion failed because when bdrv_close_all() made > > d->conf.blk NULL, blk_get_aio_context() returned the global AioContext > > instead of s->ctx. > > > > The same assertion can still fail today when vcpus submit new I/O > > requests after iothread_stop_all() has moved the BDS to the global > > AioContext. > > > > This patch hardens the iothread_stop_all() approach by pausing vcpus > > before calling iothread_stop_all(). > > > > Note that the assertion failure is a race condition. It is not possible > > to reproduce it reliably. > > > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > > Does pausing the vcpus actually make sure that the iothread isn't active > any more, or do we still have a small window where the vcpu is already > stopped, but the iothread is still processing requests? > > Essentially, I think the bdrv_set_aio_context() in iothread_stop_all() > does either not have any effect, or if it does have an effect, it's > wrong. You can't just force an in-use BDS into a different AioContext > when the user that set the AioContext is still there. > > At the very least, do we need a blk_drain_all() before stopping the > iothreads?
bdrv_set_aio_context() contains aio_disable_external() + bdrv_parent_drained_begin() + bdrv_drain(bs). This should complete all requests, even those sitting in a descriptor ring that hasn't been processed yet. > It would still just be a hack, the proper way seens to be > getting the virtio device out of dataplane mode so that the iothread is > actually unused and doesn't just happen to not process something at the > moment. Agreed, the existing approach is a hack. I'm not keen on implementing a proper device<->IOThread detach operation because vl.c:main() seems to be the only place that needs it - and it can get away with just quiescing requests and the IOThread instead. Stefan
signature.asc
Description: PGP signature