Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
On 08/08/2017 14:53, Stefan Hajnoczi wrote: > On Fri, Aug 04, 2017 at 01:46:17PM +0200, Paolo Bonzini wrote: >> On 04/08/2017 11:58, Stefan Hajnoczi wrote: the root cause of this bug is related to this as well: https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html From commit 99723548 we started assuming (incorrectly?) that blk_ functions always WILL have an attached BDS, but this is not always true, for instance, flushing the cache from an empty CDROM. Paolo, can we move the flight counter increment outside of the block-backend layer, is that safe? >>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed >>> regardless of the throttling timer issue discussed below. BB cannot >>> assume that the BDS graph is non-empty. >> >> Can we make bdrv_aio_* return NULL (even temporarily) if there is no >> attached BDS? That would make it much easier to fix. > > There are many blk_aio_*() callers. Returning NULL forces them to > perform extra error handling. Most of them either are for non-removable devices and check for a non-empty BB at startup. The others (ide-cd and scsi-cd, sd) check the BlockBackend's blk_is_available or blk_is_inserted state themselves. It does require some auditing of course, remember that anything that would return NULL, would now be crashing already in bdrv_inc_in_flight. We'd be seeing NULL pointer dereferences left and right, if it were so bad. > When you say "temporarily" do you mean it returns NULL but schedules a > one-shot BH to invoke the callback? I wonder if we can use a singleton > aiocb instead of NULL for -ENOMEDIUM errors. No, I mean undoing that in 2.11. Paolo signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
Am 08.08.2017 um 14:53 hat Stefan Hajnoczi geschrieben: > On Fri, Aug 04, 2017 at 01:46:17PM +0200, Paolo Bonzini wrote: > > On 04/08/2017 11:58, Stefan Hajnoczi wrote: > > >> the root cause of this bug is related to this as well: > > >> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html > > >> > > >> From commit 99723548 we started assuming (incorrectly?) that blk_ > > >> functions always WILL have an attached BDS, but this is not always true, > > >> for instance, flushing the cache from an empty CDROM. > > >> > > >> Paolo, can we move the flight counter increment outside of the > > >> block-backend layer, is that safe? > > > I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed > > > regardless of the throttling timer issue discussed below. BB cannot > > > assume that the BDS graph is non-empty. > > > > Can we make bdrv_aio_* return NULL (even temporarily) if there is no > > attached BDS? That would make it much easier to fix. > > There are many blk_aio_*() callers. Returning NULL forces them to > perform extra error handling. Yes, that's my concern. We removed NULL returns a long time ago. Most callers probably don't check for it any more. > When you say "temporarily" do you mean it returns NULL but schedules a > one-shot BH to invoke the callback? I wonder if we can use a singleton > aiocb instead of NULL for -ENOMEDIUM errors. This doesn't help. As soon as you involve BHs, you need to consider them during blk_drain(), otherwise the drain can return too early. And if you want to consider them during blk_drain()... well, I made an attempt, maybe we can make it work with some more changes. But I'm starting to see that it's not a trivial change; though admittedly, the NULL return thing doesn't look trivial either. Kevin signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
On Fri, Aug 04, 2017 at 01:46:17PM +0200, Paolo Bonzini wrote: > On 04/08/2017 11:58, Stefan Hajnoczi wrote: > >> the root cause of this bug is related to this as well: > >> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html > >> > >> From commit 99723548 we started assuming (incorrectly?) that blk_ > >> functions always WILL have an attached BDS, but this is not always true, > >> for instance, flushing the cache from an empty CDROM. > >> > >> Paolo, can we move the flight counter increment outside of the > >> block-backend layer, is that safe? > > I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed > > regardless of the throttling timer issue discussed below. BB cannot > > assume that the BDS graph is non-empty. > > Can we make bdrv_aio_* return NULL (even temporarily) if there is no > attached BDS? That would make it much easier to fix. There are many blk_aio_*() callers. Returning NULL forces them to perform extra error handling. When you say "temporarily" do you mean it returns NULL but schedules a one-shot BH to invoke the callback? I wonder if we can use a singleton aiocb instead of NULL for -ENOMEDIUM errors. Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
On 08/08/2017 13:56, Kevin Wolf wrote: > Am 08.08.2017 um 13:04 hat Paolo Bonzini geschrieben: >> On 08/08/2017 12:02, Kevin Wolf wrote: >>> Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben: On 04/08/2017 11:58, Stefan Hajnoczi wrote: >> the root cause of this bug is related to this as well: >> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html >> >> From commit 99723548 we started assuming (incorrectly?) that blk_ >> functions always WILL have an attached BDS, but this is not always true, >> for instance, flushing the cache from an empty CDROM. >> >> Paolo, can we move the flight counter increment outside of the >> block-backend layer, is that safe? > I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed > regardless of the throttling timer issue discussed below. BB cannot > assume that the BDS graph is non-empty. Can we make bdrv_aio_* return NULL (even temporarily) if there is no attached BDS? That would make it much easier to fix. >>> >>> Would the proper fix be much more complicated than the following? I must >>> admit that I don't fully understand the current state of affairs with >>> respect to threading, AioContext etc. so I may well be missing >>> something. >> >> Not much, but it's not complete either. The issues I see are that: 1) >> blk_drain_all does not take the new counter into account; > > Ok, I think this does the trick: > > void blk_drain_all(void) > { > BlockBackend *blk = NULL; > > bdrv_drain_all_begin(); > while ((blk = blk_all_next(blk)) != NULL) { > blk_drain(blk); > } > bdrv_drain_all_end(); > } Small note: blk_drain_all is called while no AioContext has been acquired, while blk_drain requires you to acquire blk's AioContext. And this of course should be split in blk_drain_all_begin/blk_drain_all_end. >> 2) bdrv_drain_all callers need to be audited to see if they should be >> blk_drain_all (or more likely, only device BlockBackends should be drained). > > qmp_transaction() is unclear to me. It should be changed in some way > anyway because it uses bdrv_drain_all() rather than a begin/end pair. I think every ->prepare should do drained_begin and ->clean should end the section. Most of them are doing it already. > do_vm_stop() and vm_stop_force_state() probably want blk_drain_all(). Or just all devices. From the October 2016 discussion, "qdev drive properties would install a vmstate notifier to quiesce their own BlockBackend rather than relying on a blind bdrv_drain_all from vm_stop" (though there'll surely be some non-qdevified straggler). > xen_invalidate_map_cache() - wtf? Looks like the wrong layer to do this, > but I guess blk_drain_all(), too. It seems to be an advisory operation triggered by balloon deflation. Whatever. > block_migration_cleanup() is just lazy and really means a blk_drain() > for its own BlockBackends. blk_drain_all() as the simple conversion. > > migration/savevm: Migration wants blk_drain_all() to get the devices > quiesced. It already does vm_stop though. It doesn't seem to be necessary at all. > qemu-io: blk_drain_all(), too. Or just blk_drain(qemuio_blk). > Hm, looks like there won't be many callers of bdrv_drain_all() left. :-) Not many blk_drain_all(), and that's not a bad thing. You missed bdrv_reopen_multiple. Maybe it also should not do anything, and the caller should do begin/end instead. >>> Note that my blk_drain() implementation doesn't necessarily drain >>> blk_bs(blk) completely, but only those requests that came from the >>> specific BlockBackend. I think this is what the callers want, but >>> if otherwise, it shouldn't be hard to change. >> >> Yes, this should be what they want. > > Apparently not; block jobs don't complete with it any more. I haven't > checked in detail, but it makes sense that they can have a BH (e.g. for > block_job_defer_to_main_loop) without a request being in flight. That BH should be handled here: while (!job->completed) { aio_poll(qemu_get_aio_context(), true); } so that should be okay. (Block jobs have been a huge pain every time I touched bdrv_drain, indeed...). > So I'm including an unconditional bdrv_drain() again. Or I guess, > calling aio_poll() unconditionally and including its return value > in the loop condition would be the cleaner approach? Note that if you have no block device then you don't have an AioContext for aio_poll() to use. In fact, we're really close to removing the AioContext dependency for submitting I/O requests (AioContext remains only as a place for network drivers to register their sockets' I/O handlers), so you shouldn't use it from block-backend.c. That's why I suggested returning NULL from AIO operations if there is no blk_bs(blk). It may be a little less tidy (I'm more worried about static analysis lossage than anything else), but it does make some sense. Most callers are checking blk_is_available or
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
Am 08.08.2017 um 13:04 hat Paolo Bonzini geschrieben: > On 08/08/2017 12:02, Kevin Wolf wrote: > > Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben: > >> On 04/08/2017 11:58, Stefan Hajnoczi wrote: > the root cause of this bug is related to this as well: > https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html > > From commit 99723548 we started assuming (incorrectly?) that blk_ > functions always WILL have an attached BDS, but this is not always true, > for instance, flushing the cache from an empty CDROM. > > Paolo, can we move the flight counter increment outside of the > block-backend layer, is that safe? > >>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed > >>> regardless of the throttling timer issue discussed below. BB cannot > >>> assume that the BDS graph is non-empty. > >> > >> Can we make bdrv_aio_* return NULL (even temporarily) if there is no > >> attached BDS? That would make it much easier to fix. > > > > Would the proper fix be much more complicated than the following? I must > > admit that I don't fully understand the current state of affairs with > > respect to threading, AioContext etc. so I may well be missing > > something. > > Not much, but it's not complete either. The issues I see are that: 1) > blk_drain_all does not take the new counter into account; Ok, I think this does the trick: void blk_drain_all(void) { BlockBackend *blk = NULL; bdrv_drain_all_begin(); while ((blk = blk_all_next(blk)) != NULL) { blk_drain(blk); } bdrv_drain_all_end(); } > 2) bdrv_drain_all callers need to be audited to see if they should be > blk_drain_all (or more likely, only device BlockBackends should be drained). qmp_transaction() is unclear to me. It should be changed in some way anyway because it uses bdrv_drain_all() rather than a begin/end pair. do_vm_stop() and vm_stop_force_state() probably want blk_drain_all(). xen_invalidate_map_cache() - wtf? Looks like the wrong layer to do this, but I guess blk_drain_all(), too. block_migration_cleanup() is just lazy and really means a blk_drain() for its own BlockBackends. blk_drain_all() as the simple conversion. migration/savevm: Migration wants blk_drain_all() to get the devices quiesced. qemu-io: blk_drain_all(), too. Hm, looks like there won't be many callers of bdrv_drain_all() left. :-) > > Note that my blk_drain() implementation doesn't necessarily drain > > blk_bs(blk) completely, but only those requests that came from the > > specific BlockBackend. I think this is what the callers want, but > > if otherwise, it shouldn't be hard to change. > > Yes, this should be what they want. Apparently not; block jobs don't complete with it any more. I haven't checked in detail, but it makes sense that they can have a BH (e.g. for block_job_defer_to_main_loop) without a request being in flight. So I'm including an unconditional bdrv_drain() again. Or I guess, calling aio_poll() unconditionally and including its return value in the loop condition would be the cleaner approach? Kevin
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
On 08/08/2017 12:02, Kevin Wolf wrote: > Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben: >> On 04/08/2017 11:58, Stefan Hajnoczi wrote: the root cause of this bug is related to this as well: https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html From commit 99723548 we started assuming (incorrectly?) that blk_ functions always WILL have an attached BDS, but this is not always true, for instance, flushing the cache from an empty CDROM. Paolo, can we move the flight counter increment outside of the block-backend layer, is that safe? >>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed >>> regardless of the throttling timer issue discussed below. BB cannot >>> assume that the BDS graph is non-empty. >> >> Can we make bdrv_aio_* return NULL (even temporarily) if there is no >> attached BDS? That would make it much easier to fix. > > Would the proper fix be much more complicated than the following? I must > admit that I don't fully understand the current state of affairs with > respect to threading, AioContext etc. so I may well be missing > something. Not much, but it's not complete either. The issues I see are that: 1) blk_drain_all does not take the new counter into account; 2) bdrv_drain_all callers need to be audited to see if they should be blk_drain_all (or more likely, only device BlockBackends should be drained). Paolo > Note that my blk_drain() implementation doesn't necessarily drain > blk_bs(blk) completely, but only those requests that came from the > specific BlockBackend. I think this is what the callers want, but > if otherwise, it shouldn't be hard to change. Yes, this should be what they want. Paolo > Kevin > > diff --git a/block.c b/block.c > index f1c3e98a4d..07decc3b5f 100644 > --- a/block.c > +++ b/block.c > @@ -4563,7 +4563,7 @@ out: > > AioContext *bdrv_get_aio_context(BlockDriverState *bs) > { > -return bs->aio_context; > +return bs ? bs->aio_context : qemu_get_aio_context(); > } > > void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co) > diff --git a/block/block-backend.c b/block/block-backend.c > index 968438c149..649d11b4ef 100644 > --- a/block/block-backend.c > +++ b/block/block-backend.c > @@ -68,6 +68,9 @@ struct BlockBackend { > NotifierList remove_bs_notifiers, insert_bs_notifiers; > > int quiesce_counter; > + > +/* Number of in-flight requests. Accessed with atomic ops. */ > +unsigned int in_flight; > }; > > typedef struct BlockBackendAIOCB { > @@ -1109,6 +1112,16 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags > flags) > return bdrv_make_zero(blk->root, flags); > } > > +static void blk_inc_in_flight(BlockBackend *blk) > +{ > +atomic_inc(>in_flight); > +} > + > +static void blk_dec_in_flight(BlockBackend *blk) > +{ > +atomic_dec(>in_flight); > +} > + > static void error_callback_bh(void *opaque) > { > struct BlockBackendAIOCB *acb = opaque; > @@ -1147,7 +1160,7 @@ static const AIOCBInfo blk_aio_em_aiocb_info = { > static void blk_aio_complete(BlkAioEmAIOCB *acb) > { > if (acb->has_returned) { > -bdrv_dec_in_flight(acb->common.bs); > +blk_dec_in_flight(acb->rwco.blk); > acb->common.cb(acb->common.opaque, acb->rwco.ret); > qemu_aio_unref(acb); > } > @@ -1168,7 +1181,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, > int64_t offset, int bytes, > BlkAioEmAIOCB *acb; > Coroutine *co; > > -bdrv_inc_in_flight(blk_bs(blk)); > +blk_inc_in_flight(blk); > acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque); > acb->rwco = (BlkRwCo) { > .blk= blk, > @@ -1405,8 +1418,16 @@ int blk_flush(BlockBackend *blk) > > void blk_drain(BlockBackend *blk) > { > -if (blk_bs(blk)) { > -bdrv_drain(blk_bs(blk)); > +AioContext *ctx = blk_get_aio_context(blk); > + > +while (atomic_read(>in_flight)) { > +aio_context_acquire(ctx); > +aio_poll(ctx, false); > +aio_context_release(ctx); > + > +if (blk_bs(blk)) { > +bdrv_drain(blk_bs(blk)); > +} > } > } > > @@ -1453,10 +1474,11 @@ static void send_qmp_error_event(BlockBackend *blk, > bool is_read, int error) > { > IoOperationType optype; > +BlockDriverState *bs = blk_bs(blk); > > optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE; > qapi_event_send_block_io_error(blk_name(blk), > - bdrv_get_node_name(blk_bs(blk)), optype, > + bs ? bdrv_get_node_name(bs) : "", optype, > action, blk_iostatus_is_enabled(blk), > error == ENOSPC, strerror(error), > _abort); > diff --git a/tests/ide-test.c b/tests/ide-test.c > index bfd79ddbdc..ffbfb04271 100644 > --- a/tests/ide-test.c > +++
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben: > On 04/08/2017 11:58, Stefan Hajnoczi wrote: > >> the root cause of this bug is related to this as well: > >> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html > >> > >> From commit 99723548 we started assuming (incorrectly?) that blk_ > >> functions always WILL have an attached BDS, but this is not always true, > >> for instance, flushing the cache from an empty CDROM. > >> > >> Paolo, can we move the flight counter increment outside of the > >> block-backend layer, is that safe? > > I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed > > regardless of the throttling timer issue discussed below. BB cannot > > assume that the BDS graph is non-empty. > > Can we make bdrv_aio_* return NULL (even temporarily) if there is no > attached BDS? That would make it much easier to fix. Would the proper fix be much more complicated than the following? I must admit that I don't fully understand the current state of affairs with respect to threading, AioContext etc. so I may well be missing something. Note that my blk_drain() implementation doesn't necessarily drain blk_bs(blk) completely, but only those requests that came from the specific BlockBackend. I think this is what the callers want, but if otherwise, it shouldn't be hard to change. Kevin diff --git a/block.c b/block.c index f1c3e98a4d..07decc3b5f 100644 --- a/block.c +++ b/block.c @@ -4563,7 +4563,7 @@ out: AioContext *bdrv_get_aio_context(BlockDriverState *bs) { -return bs->aio_context; +return bs ? bs->aio_context : qemu_get_aio_context(); } void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co) diff --git a/block/block-backend.c b/block/block-backend.c index 968438c149..649d11b4ef 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -68,6 +68,9 @@ struct BlockBackend { NotifierList remove_bs_notifiers, insert_bs_notifiers; int quiesce_counter; + +/* Number of in-flight requests. Accessed with atomic ops. */ +unsigned int in_flight; }; typedef struct BlockBackendAIOCB { @@ -1109,6 +1112,16 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags) return bdrv_make_zero(blk->root, flags); } +static void blk_inc_in_flight(BlockBackend *blk) +{ +atomic_inc(>in_flight); +} + +static void blk_dec_in_flight(BlockBackend *blk) +{ +atomic_dec(>in_flight); +} + static void error_callback_bh(void *opaque) { struct BlockBackendAIOCB *acb = opaque; @@ -1147,7 +1160,7 @@ static const AIOCBInfo blk_aio_em_aiocb_info = { static void blk_aio_complete(BlkAioEmAIOCB *acb) { if (acb->has_returned) { -bdrv_dec_in_flight(acb->common.bs); +blk_dec_in_flight(acb->rwco.blk); acb->common.cb(acb->common.opaque, acb->rwco.ret); qemu_aio_unref(acb); } @@ -1168,7 +1181,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes, BlkAioEmAIOCB *acb; Coroutine *co; -bdrv_inc_in_flight(blk_bs(blk)); +blk_inc_in_flight(blk); acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque); acb->rwco = (BlkRwCo) { .blk= blk, @@ -1405,8 +1418,16 @@ int blk_flush(BlockBackend *blk) void blk_drain(BlockBackend *blk) { -if (blk_bs(blk)) { -bdrv_drain(blk_bs(blk)); +AioContext *ctx = blk_get_aio_context(blk); + +while (atomic_read(>in_flight)) { +aio_context_acquire(ctx); +aio_poll(ctx, false); +aio_context_release(ctx); + +if (blk_bs(blk)) { +bdrv_drain(blk_bs(blk)); +} } } @@ -1453,10 +1474,11 @@ static void send_qmp_error_event(BlockBackend *blk, bool is_read, int error) { IoOperationType optype; +BlockDriverState *bs = blk_bs(blk); optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE; qapi_event_send_block_io_error(blk_name(blk), - bdrv_get_node_name(blk_bs(blk)), optype, + bs ? bdrv_get_node_name(bs) : "", optype, action, blk_iostatus_is_enabled(blk), error == ENOSPC, strerror(error), _abort); diff --git a/tests/ide-test.c b/tests/ide-test.c index bfd79ddbdc..ffbfb04271 100644 --- a/tests/ide-test.c +++ b/tests/ide-test.c @@ -689,6 +689,24 @@ static void test_flush_nodev(void) ide_test_quit(); } +static void test_flush_empty_drive(void) +{ +QPCIDevice *dev; +QPCIBar bmdma_bar, ide_bar; + +ide_test_start("-device ide-cd,bus=ide.0"); +dev = get_pci_device(_bar, _bar); + +/* FLUSH CACHE command on device 0*/ +qpci_io_writeb(dev, ide_bar, reg_device, 0); +qpci_io_writeb(dev, ide_bar, reg_command, CMD_FLUSH_CACHE); + +/* Just testing that qemu doesn't crash... */ + +free_pci_device(dev); +ide_test_quit(); +} + static void
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
On 04/08/2017 11:58, Stefan Hajnoczi wrote: >> the root cause of this bug is related to this as well: >> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html >> >> From commit 99723548 we started assuming (incorrectly?) that blk_ >> functions always WILL have an attached BDS, but this is not always true, >> for instance, flushing the cache from an empty CDROM. >> >> Paolo, can we move the flight counter increment outside of the >> block-backend layer, is that safe? > I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed > regardless of the throttling timer issue discussed below. BB cannot > assume that the BDS graph is non-empty. Can we make bdrv_aio_* return NULL (even temporarily) if there is no attached BDS? That would make it much easier to fix. Paolo
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
On Mon, Jul 17, 2017 at 5:43 PM, John Snowwrote: > On 07/17/2017 06:26 AM, Dr. David Alan Gilbert wrote: >> * Stefan Hajnoczi (stefa...@gmail.com) wrote: >>> On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) >>> wrote: From: "Dr. David Alan Gilbert" There's a rare exit seg if the guest is accessing IO during exit. It's always hitting the atomic_inc(>in_flight) with a NULL bs. This was added recently in 99723548 but I don't see it as the cause. Flip vl.c around so we pause the cpus before closing the block devices, that way we shouldn't have anything trying to access them when they're gone. This was originally Red Hat bz https://bugzilla.redhat.com/show_bug.cgi?id=1451015 Signed-off-by: Dr. David Alan Gilbert Reported-by: Cong Li -- This is a very rare race, I'll leave it running in a loop to see if we hit anything else and to check this really fixes it. I do worry if there are other cases that can trigger this - e.g. hot-unplug or ejecting a CD. --- vl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> Reviewed-by: Stefan Hajnoczi >> >> Thanks; and the test I left running seems solid - ~12k runs >> over the weekend with no seg. >> >> Dave >> >> -- >> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK >> > > the root cause of this bug is related to this as well: > https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html > > From commit 99723548 we started assuming (incorrectly?) that blk_ > functions always WILL have an attached BDS, but this is not always true, > for instance, flushing the cache from an empty CDROM. > > Paolo, can we move the flight counter increment outside of the > block-backend layer, is that safe? I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed regardless of the throttling timer issue discussed below. BB cannot assume that the BDS graph is non-empty. Stefan
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
On Thu, Aug 3, 2017 at 11:36 PM, Paolo Bonzini <pbonz...@redhat.com> wrote: > - Original Message - >> From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> >> To: "Alberto Garcia" <be...@igalia.com> >> Cc: qemu-devel@nongnu.org, pbonz...@redhat.com, js...@redhat.com >> Sent: Thursday, August 3, 2017 6:45:17 PM >> Subject: Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block >> devices >> >> * Alberto Garcia (be...@igalia.com) wrote: >> > On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) >> > wrote: >> > > --- a/vl.c >> > > +++ b/vl.c >> > > @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp) >> > > replay_disable_events(); >> > > iothread_stop_all(); >> > > >> > > -bdrv_close_all(); >> > > pause_all_vcpus(); >> > > +bdrv_close_all(); >> > > res_free(); >> > >> > I haven't debugged it yet, but in my computer iotest 093 stops working >> > (it never finishes) after this change. >> >> Yes, I can reproduce that here (I've got to explicitly run 093 - it >> doesn't do it automatically for me): > > The culprit so to speak is this: > > if (qtest_enabled()) { > /* For testing block IO throttling only */ > tg->clock_type = QEMU_CLOCK_VIRTUAL; > } > > So after pause_all_vcpus(), the clock doesn't advance and bdrv_close_all > hangs. Should throttling be disabled by the time bdrv_close drains the > BlockDriverState, and likewise for bdrv_close_all? bdrv_drain_all() is called before blk_remove_all_bs(), so throttling is still enabled while draining: void bdrv_close_all(void) { block_job_cancel_sync_all(); nbd_export_close_all(); /* Drop references from requests still in flight, such as canceled block * jobs whose AIO context has not been polled yet */ bdrv_drain_all(); blk_remove_all_bs(); Perhaps we need a bdrv_throttle_disable_all() function. Manos is moving throttling to a block driver so there is no longer a 1:1 relationship between BB and throttling - there might be multiple throtting nodes in a BDS graph. Stefan
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
- Original Message - > From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> > To: "Alberto Garcia" <be...@igalia.com> > Cc: qemu-devel@nongnu.org, pbonz...@redhat.com, js...@redhat.com > Sent: Thursday, August 3, 2017 6:45:17 PM > Subject: Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block > devices > > * Alberto Garcia (be...@igalia.com) wrote: > > On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) > > wrote: > > > --- a/vl.c > > > +++ b/vl.c > > > @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp) > > > replay_disable_events(); > > > iothread_stop_all(); > > > > > > -bdrv_close_all(); > > > pause_all_vcpus(); > > > +bdrv_close_all(); > > > res_free(); > > > > I haven't debugged it yet, but in my computer iotest 093 stops working > > (it never finishes) after this change. > > Yes, I can reproduce that here (I've got to explicitly run 093 - it > doesn't do it automatically for me): The culprit so to speak is this: if (qtest_enabled()) { /* For testing block IO throttling only */ tg->clock_type = QEMU_CLOCK_VIRTUAL; } So after pause_all_vcpus(), the clock doesn't advance and bdrv_close_all hangs. Should throttling be disabled by the time bdrv_close drains the BlockDriverState, and likewise for bdrv_close_all? Paolo
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
* Alberto Garcia (be...@igalia.com) wrote: > On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote: > > --- a/vl.c > > +++ b/vl.c > > @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp) > > replay_disable_events(); > > iothread_stop_all(); > > > > -bdrv_close_all(); > > pause_all_vcpus(); > > +bdrv_close_all(); > > res_free(); > > I haven't debugged it yet, but in my computer iotest 093 stops working > (it never finishes) after this change. Yes, I can reproduce that here (I've got to explicitly run 093 - it doesn't do it automatically for me): (gdb) where #0 0x7feee3780b76 in ppoll () at /lib64/libc.so.6 #1 0x564aed293199 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77 #2 0x564aed293199 in qemu_poll_ns (fds=, nfds=, timeout=) at /home/dgilbert/git/qemu/util/qemu-timer.c:322 #3 0x564aed294de1 in aio_poll (ctx=ctx@entry=0x564aefa578e0, blocking=) at /home/dgilbert/git/qemu/util/aio-posix.c:622 #4 0x564aed215924 in bdrv_drain_recurse (bs=bs@entry=0x564aefa6c320) at /home/dgilbert/git/qemu/block/io.c:188 #5 0x564aed216345 in bdrv_drain_all_begin () at /home/dgilbert/git/qemu/block/io.c:353 #6 0x564aed2164b9 in bdrv_drain_all () at /home/dgilbert/git/qemu/block/io.c:382 #7 0x564aed1c9b63 in bdrv_close_all () at /home/dgilbert/git/qemu/block.c:3113 #8 0x564aeceb04fe in main (argc=, argv=, envp=) at /home/dgilbert/git/qemu/vl.c:4795 Dave > Berto -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote: > --- a/vl.c > +++ b/vl.c > @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp) > replay_disable_events(); > iothread_stop_all(); > > -bdrv_close_all(); > pause_all_vcpus(); > +bdrv_close_all(); > res_free(); I haven't debugged it yet, but in my computer iotest 093 stops working (it never finishes) after this change. Berto
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
On 07/17/2017 06:26 AM, Dr. David Alan Gilbert wrote: > * Stefan Hajnoczi (stefa...@gmail.com) wrote: >> On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote: >>> From: "Dr. David Alan Gilbert">>> >>> There's a rare exit seg if the guest is accessing >>> IO during exit. >>> It's always hitting the atomic_inc(>in_flight) with a NULL >>> bs. This was added recently in 99723548 but I don't see it >>> as the cause. >>> >>> Flip vl.c around so we pause the cpus before closing the block devices, >>> that way we shouldn't have anything trying to access them when >>> they're gone. >>> >>> This was originally Red Hat bz >>> https://bugzilla.redhat.com/show_bug.cgi?id=1451015 >>> >>> Signed-off-by: Dr. David Alan Gilbert >>> Reported-by: Cong Li >>> >>> -- >>> This is a very rare race, I'll leave it running in a loop to see if >>> we hit anything else and to check this really fixes it. >>> >>> I do worry if there are other cases that can trigger this - e.g. >>> hot-unplug or ejecting a CD. >>> >>> --- >>> vl.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> Reviewed-by: Stefan Hajnoczi > > Thanks; and the test I left running seems solid - ~12k runs > over the weekend with no seg. > > Dave > > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > the root cause of this bug is related to this as well: https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html >From commit 99723548 we started assuming (incorrectly?) that blk_ functions always WILL have an attached BDS, but this is not always true, for instance, flushing the cache from an empty CDROM. Paolo, can we move the flight counter increment outside of the block-backend layer, is that safe? --js
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
* Stefan Hajnoczi (stefa...@gmail.com) wrote: > On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert"> > > > There's a rare exit seg if the guest is accessing > > IO during exit. > > It's always hitting the atomic_inc(>in_flight) with a NULL > > bs. This was added recently in 99723548 but I don't see it > > as the cause. > > > > Flip vl.c around so we pause the cpus before closing the block devices, > > that way we shouldn't have anything trying to access them when > > they're gone. > > > > This was originally Red Hat bz > > https://bugzilla.redhat.com/show_bug.cgi?id=1451015 > > > > Signed-off-by: Dr. David Alan Gilbert > > Reported-by: Cong Li > > > > -- > > This is a very rare race, I'll leave it running in a loop to see if > > we hit anything else and to check this really fixes it. > > > > I do worry if there are other cases that can trigger this - e.g. > > hot-unplug or ejecting a CD. > > > > --- > > vl.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > Reviewed-by: Stefan Hajnoczi Thanks; and the test I left running seems solid - ~12k runs over the weekend with no seg. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert"> > There's a rare exit seg if the guest is accessing > IO during exit. > It's always hitting the atomic_inc(>in_flight) with a NULL > bs. This was added recently in 99723548 but I don't see it > as the cause. > > Flip vl.c around so we pause the cpus before closing the block devices, > that way we shouldn't have anything trying to access them when > they're gone. > > This was originally Red Hat bz > https://bugzilla.redhat.com/show_bug.cgi?id=1451015 > > Signed-off-by: Dr. David Alan Gilbert > Reported-by: Cong Li > > -- > This is a very rare race, I'll leave it running in a loop to see if > we hit anything else and to check this really fixes it. > > I do worry if there are other cases that can trigger this - e.g. > hot-unplug or ejecting a CD. > > --- > vl.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature