subject:"Re\: \[Qemu\-devel\] \[PATCH\] vl.c\/exit\: pause cpus before closing block devices"

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-08 Thread Paolo Bonzini

On 08/08/2017 14:53, Stefan Hajnoczi wrote:
> On Fri, Aug 04, 2017 at 01:46:17PM +0200, Paolo Bonzini wrote:
>> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
 the root cause of this bug is related to this as well:
 https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html

 From commit 99723548 we started assuming (incorrectly?) that blk_
 functions always WILL have an attached BDS, but this is not always true,
 for instance, flushing the cache from an empty CDROM.

 Paolo, can we move the flight counter increment outside of the
 block-backend layer, is that safe?
>>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
>>> regardless of the throttling timer issue discussed below.  BB cannot
>>> assume that the BDS graph is non-empty.
>>
>> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
>> attached BDS?  That would make it much easier to fix.
> 
> There are many blk_aio_*() callers.  Returning NULL forces them to
> perform extra error handling.

Most of them either are for non-removable devices and check for a
non-empty BB at startup.  The others (ide-cd and scsi-cd, sd) check the
BlockBackend's blk_is_available or blk_is_inserted state themselves.

It does require some auditing of course, remember that anything that
would return NULL, would now be crashing already in bdrv_inc_in_flight.
We'd be seeing NULL pointer dereferences left and right, if it were so bad.

> When you say "temporarily" do you mean it returns NULL but schedules a
> one-shot BH to invoke the callback?  I wonder if we can use a singleton
> aiocb instead of NULL for -ENOMEDIUM errors.

No, I mean undoing that in 2.11.

Paolo

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-08 Thread Kevin Wolf

Am 08.08.2017 um 14:53 hat Stefan Hajnoczi geschrieben:
> On Fri, Aug 04, 2017 at 01:46:17PM +0200, Paolo Bonzini wrote:
> > On 04/08/2017 11:58, Stefan Hajnoczi wrote:
> > >> the root cause of this bug is related to this as well:
> > >> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
> > >>
> > >> From commit 99723548 we started assuming (incorrectly?) that blk_
> > >> functions always WILL have an attached BDS, but this is not always true,
> > >> for instance, flushing the cache from an empty CDROM.
> > >>
> > >> Paolo, can we move the flight counter increment outside of the
> > >> block-backend layer, is that safe?
> > > I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
> > > regardless of the throttling timer issue discussed below.  BB cannot
> > > assume that the BDS graph is non-empty.
> > 
> > Can we make bdrv_aio_* return NULL (even temporarily) if there is no
> > attached BDS?  That would make it much easier to fix.
> 
> There are many blk_aio_*() callers.  Returning NULL forces them to
> perform extra error handling.

Yes, that's my concern. We removed NULL returns a long time ago. Most
callers probably don't check for it any more.

> When you say "temporarily" do you mean it returns NULL but schedules a
> one-shot BH to invoke the callback?  I wonder if we can use a singleton
> aiocb instead of NULL for -ENOMEDIUM errors.

This doesn't help. As soon as you involve BHs, you need to consider them
during blk_drain(), otherwise the drain can return too early.

And if you want to consider them during blk_drain()... well, I made an
attempt, maybe we can make it work with some more changes. But I'm
starting to see that it's not a trivial change; though admittedly, the
NULL return thing doesn't look trivial either.

Kevin


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-08 Thread Stefan Hajnoczi

On Fri, Aug 04, 2017 at 01:46:17PM +0200, Paolo Bonzini wrote:
> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
> >> the root cause of this bug is related to this as well:
> >> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
> >>
> >> From commit 99723548 we started assuming (incorrectly?) that blk_
> >> functions always WILL have an attached BDS, but this is not always true,
> >> for instance, flushing the cache from an empty CDROM.
> >>
> >> Paolo, can we move the flight counter increment outside of the
> >> block-backend layer, is that safe?
> > I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
> > regardless of the throttling timer issue discussed below.  BB cannot
> > assume that the BDS graph is non-empty.
> 
> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
> attached BDS?  That would make it much easier to fix.

There are many blk_aio_*() callers.  Returning NULL forces them to
perform extra error handling.

When you say "temporarily" do you mean it returns NULL but schedules a
one-shot BH to invoke the callback?  I wonder if we can use a singleton
aiocb instead of NULL for -ENOMEDIUM errors.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-08 Thread Paolo Bonzini

On 08/08/2017 13:56, Kevin Wolf wrote:
> Am 08.08.2017 um 13:04 hat Paolo Bonzini geschrieben:
>> On 08/08/2017 12:02, Kevin Wolf wrote:
>>> Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben:
 On 04/08/2017 11:58, Stefan Hajnoczi wrote:
>> the root cause of this bug is related to this as well:
>> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
>>
>> From commit 99723548 we started assuming (incorrectly?) that blk_
>> functions always WILL have an attached BDS, but this is not always true,
>> for instance, flushing the cache from an empty CDROM.
>>
>> Paolo, can we move the flight counter increment outside of the
>> block-backend layer, is that safe?
> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
> regardless of the throttling timer issue discussed below.  BB cannot
> assume that the BDS graph is non-empty.

 Can we make bdrv_aio_* return NULL (even temporarily) if there is no
 attached BDS?  That would make it much easier to fix.
>>>
>>> Would the proper fix be much more complicated than the following? I must
>>> admit that I don't fully understand the current state of affairs with
>>> respect to threading, AioContext etc. so I may well be missing
>>> something.
>>
>> Not much, but it's not complete either.  The issues I see are that: 1)
>> blk_drain_all does not take the new counter into account;
> 
> Ok, I think this does the trick:
> 
> void blk_drain_all(void)
> {
> BlockBackend *blk = NULL;
> 
> bdrv_drain_all_begin();
> while ((blk = blk_all_next(blk)) != NULL) {
> blk_drain(blk);
> }
> bdrv_drain_all_end();
> }

Small note: blk_drain_all is called while no AioContext has been
acquired, while blk_drain requires you to acquire blk's AioContext.

And this of course should be split in blk_drain_all_begin/blk_drain_all_end.

>> 2) bdrv_drain_all callers need to be audited to see if they should be
>> blk_drain_all (or more likely, only device BlockBackends should be drained).
> 
> qmp_transaction() is unclear to me. It should be changed in some way
> anyway because it uses bdrv_drain_all() rather than a begin/end pair.

I think every ->prepare should do drained_begin and ->clean should end
the section.  Most of them are doing it already.

> do_vm_stop() and vm_stop_force_state() probably want blk_drain_all().

Or just all devices.  From the October 2016 discussion, "qdev
drive properties would install a vmstate notifier to quiesce their own
BlockBackend rather than relying on a blind bdrv_drain_all from vm_stop"
(though there'll surely be some non-qdevified straggler).

> xen_invalidate_map_cache() - wtf? Looks like the wrong layer to do this,
> but I guess blk_drain_all(), too.

It seems to be an advisory operation triggered by balloon deflation.
Whatever.

> block_migration_cleanup() is just lazy and really means a blk_drain()
> for its own BlockBackends. blk_drain_all() as the simple conversion.
> 
> migration/savevm: Migration wants blk_drain_all() to get the devices
> quiesced.

It already does vm_stop though.  It doesn't seem to be necessary at all.

> qemu-io: blk_drain_all(), too.

Or just blk_drain(qemuio_blk).

> Hm, looks like there won't be many callers of bdrv_drain_all() left. :-)

Not many blk_drain_all(), and that's not a bad thing.

You missed bdrv_reopen_multiple.  Maybe it also should not do anything,
and the caller should do begin/end instead.

>>> Note that my blk_drain() implementation doesn't necessarily drain
>>> blk_bs(blk) completely, but only those requests that came from the
>>> specific BlockBackend. I think this is what the callers want, but
>>> if otherwise, it shouldn't be hard to change.
>>
>> Yes, this should be what they want.
> 
> Apparently not; block jobs don't complete with it any more. I haven't
> checked in detail, but it makes sense that they can have a BH (e.g. for
> block_job_defer_to_main_loop) without a request being in flight.

That BH should be handled here:

while (!job->completed) {
aio_poll(qemu_get_aio_context(), true);
}

so that should be okay.  (Block jobs have been a huge pain every time I
touched bdrv_drain, indeed...).

> So I'm including an unconditional bdrv_drain() again. Or I guess,
> calling aio_poll() unconditionally and including its return value
> in the loop condition would be the cleaner approach?

Note that if you have no block device then you don't have an AioContext
for aio_poll() to use.  In fact, we're really close to removing the
AioContext dependency for submitting I/O requests (AioContext remains
only as a place for network drivers to register their sockets' I/O
handlers), so you shouldn't use it from block-backend.c.

That's why I suggested returning NULL from AIO operations if there is no
blk_bs(blk).  It may be a little less tidy (I'm more worried about
static analysis lossage than anything else), but it does make some
sense.  Most callers are checking blk_is_available or

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-08 Thread Kevin Wolf

Am 08.08.2017 um 13:04 hat Paolo Bonzini geschrieben:
> On 08/08/2017 12:02, Kevin Wolf wrote:
> > Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben:
> >> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
>  the root cause of this bug is related to this as well:
>  https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
> 
>  From commit 99723548 we started assuming (incorrectly?) that blk_
>  functions always WILL have an attached BDS, but this is not always true,
>  for instance, flushing the cache from an empty CDROM.
> 
>  Paolo, can we move the flight counter increment outside of the
>  block-backend layer, is that safe?
> >>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
> >>> regardless of the throttling timer issue discussed below.  BB cannot
> >>> assume that the BDS graph is non-empty.
> >>
> >> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
> >> attached BDS?  That would make it much easier to fix.
> > 
> > Would the proper fix be much more complicated than the following? I must
> > admit that I don't fully understand the current state of affairs with
> > respect to threading, AioContext etc. so I may well be missing
> > something.
> 
> Not much, but it's not complete either.  The issues I see are that: 1)
> blk_drain_all does not take the new counter into account;

Ok, I think this does the trick:

void blk_drain_all(void)
{
BlockBackend *blk = NULL;

bdrv_drain_all_begin();
while ((blk = blk_all_next(blk)) != NULL) {
blk_drain(blk);
}
bdrv_drain_all_end();
}

> 2) bdrv_drain_all callers need to be audited to see if they should be
> blk_drain_all (or more likely, only device BlockBackends should be drained).

qmp_transaction() is unclear to me. It should be changed in some way
anyway because it uses bdrv_drain_all() rather than a begin/end pair.

do_vm_stop() and vm_stop_force_state() probably want blk_drain_all().

xen_invalidate_map_cache() - wtf? Looks like the wrong layer to do this,
but I guess blk_drain_all(), too.

block_migration_cleanup() is just lazy and really means a blk_drain()
for its own BlockBackends. blk_drain_all() as the simple conversion.

migration/savevm: Migration wants blk_drain_all() to get the devices
quiesced.

qemu-io: blk_drain_all(), too.

Hm, looks like there won't be many callers of bdrv_drain_all() left. :-)

> > Note that my blk_drain() implementation doesn't necessarily drain
> > blk_bs(blk) completely, but only those requests that came from the
> > specific BlockBackend. I think this is what the callers want, but
> > if otherwise, it shouldn't be hard to change.
> 
> Yes, this should be what they want.

Apparently not; block jobs don't complete with it any more. I haven't
checked in detail, but it makes sense that they can have a BH (e.g. for
block_job_defer_to_main_loop) without a request being in flight.

So I'm including an unconditional bdrv_drain() again. Or I guess,
calling aio_poll() unconditionally and including its return value
in the loop condition would be the cleaner approach?

Kevin

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-08 Thread Paolo Bonzini

On 08/08/2017 12:02, Kevin Wolf wrote:
> Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben:
>> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
 the root cause of this bug is related to this as well:
 https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html

 From commit 99723548 we started assuming (incorrectly?) that blk_
 functions always WILL have an attached BDS, but this is not always true,
 for instance, flushing the cache from an empty CDROM.

 Paolo, can we move the flight counter increment outside of the
 block-backend layer, is that safe?
>>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
>>> regardless of the throttling timer issue discussed below.  BB cannot
>>> assume that the BDS graph is non-empty.
>>
>> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
>> attached BDS?  That would make it much easier to fix.
> 
> Would the proper fix be much more complicated than the following? I must
> admit that I don't fully understand the current state of affairs with
> respect to threading, AioContext etc. so I may well be missing
> something.

Not much, but it's not complete either.  The issues I see are that: 1)
blk_drain_all does not take the new counter into account; 2)
bdrv_drain_all callers need to be audited to see if they should be
blk_drain_all (or more likely, only device BlockBackends should be drained).

Paolo

> Note that my blk_drain() implementation doesn't necessarily drain
> blk_bs(blk) completely, but only those requests that came from the
> specific BlockBackend. I think this is what the callers want, but
> if otherwise, it shouldn't be hard to change.

Yes, this should be what they want.

Paolo

> Kevin
> 
> diff --git a/block.c b/block.c
> index f1c3e98a4d..07decc3b5f 100644
> --- a/block.c
> +++ b/block.c
> @@ -4563,7 +4563,7 @@ out:
>  
>  AioContext *bdrv_get_aio_context(BlockDriverState *bs)
>  {
> -return bs->aio_context;
> +return bs ? bs->aio_context : qemu_get_aio_context();
>  }
>  
>  void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co)
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 968438c149..649d11b4ef 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -68,6 +68,9 @@ struct BlockBackend {
>  NotifierList remove_bs_notifiers, insert_bs_notifiers;
>  
>  int quiesce_counter;
> +
> +/* Number of in-flight requests. Accessed with atomic ops. */
> +unsigned int in_flight;
>  };
>  
>  typedef struct BlockBackendAIOCB {
> @@ -1109,6 +1112,16 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags 
> flags)
>  return bdrv_make_zero(blk->root, flags);
>  }
>  
> +static void blk_inc_in_flight(BlockBackend *blk)
> +{
> +atomic_inc(>in_flight);
> +}
> +
> +static void blk_dec_in_flight(BlockBackend *blk)
> +{
> +atomic_dec(>in_flight);
> +}
> +
>  static void error_callback_bh(void *opaque)
>  {
>  struct BlockBackendAIOCB *acb = opaque;
> @@ -1147,7 +1160,7 @@ static const AIOCBInfo blk_aio_em_aiocb_info = {
>  static void blk_aio_complete(BlkAioEmAIOCB *acb)
>  {
>  if (acb->has_returned) {
> -bdrv_dec_in_flight(acb->common.bs);
> +blk_dec_in_flight(acb->rwco.blk);
>  acb->common.cb(acb->common.opaque, acb->rwco.ret);
>  qemu_aio_unref(acb);
>  }
> @@ -1168,7 +1181,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, 
> int64_t offset, int bytes,
>  BlkAioEmAIOCB *acb;
>  Coroutine *co;
>  
> -bdrv_inc_in_flight(blk_bs(blk));
> +blk_inc_in_flight(blk);
>  acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
>  acb->rwco = (BlkRwCo) {
>  .blk= blk,
> @@ -1405,8 +1418,16 @@ int blk_flush(BlockBackend *blk)
>  
>  void blk_drain(BlockBackend *blk)
>  {
> -if (blk_bs(blk)) {
> -bdrv_drain(blk_bs(blk));
> +AioContext *ctx = blk_get_aio_context(blk);
> +
> +while (atomic_read(>in_flight)) {
> +aio_context_acquire(ctx);
> +aio_poll(ctx, false);
> +aio_context_release(ctx);
> +
> +if (blk_bs(blk)) {
> +bdrv_drain(blk_bs(blk));
> +}
>  }
>  }
>  
> @@ -1453,10 +1474,11 @@ static void send_qmp_error_event(BlockBackend *blk,
>   bool is_read, int error)
>  {
>  IoOperationType optype;
> +BlockDriverState *bs = blk_bs(blk);
>  
>  optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
>  qapi_event_send_block_io_error(blk_name(blk),
> -   bdrv_get_node_name(blk_bs(blk)), optype,
> +   bs ? bdrv_get_node_name(bs) : "", optype,
> action, blk_iostatus_is_enabled(blk),
> error == ENOSPC, strerror(error),
> _abort);
> diff --git a/tests/ide-test.c b/tests/ide-test.c
> index bfd79ddbdc..ffbfb04271 100644
> --- a/tests/ide-test.c
> +++

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-08 Thread Kevin Wolf

Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben:
> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
> >> the root cause of this bug is related to this as well:
> >> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
> >>
> >> From commit 99723548 we started assuming (incorrectly?) that blk_
> >> functions always WILL have an attached BDS, but this is not always true,
> >> for instance, flushing the cache from an empty CDROM.
> >>
> >> Paolo, can we move the flight counter increment outside of the
> >> block-backend layer, is that safe?
> > I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
> > regardless of the throttling timer issue discussed below.  BB cannot
> > assume that the BDS graph is non-empty.
> 
> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
> attached BDS?  That would make it much easier to fix.

Would the proper fix be much more complicated than the following? I must
admit that I don't fully understand the current state of affairs with
respect to threading, AioContext etc. so I may well be missing
something.

Note that my blk_drain() implementation doesn't necessarily drain
blk_bs(blk) completely, but only those requests that came from the
specific BlockBackend. I think this is what the callers want, but
if otherwise, it shouldn't be hard to change.

Kevin

diff --git a/block.c b/block.c
index f1c3e98a4d..07decc3b5f 100644
--- a/block.c
+++ b/block.c
@@ -4563,7 +4563,7 @@ out:
 
 AioContext *bdrv_get_aio_context(BlockDriverState *bs)
 {
-return bs->aio_context;
+return bs ? bs->aio_context : qemu_get_aio_context();
 }
 
 void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co)
diff --git a/block/block-backend.c b/block/block-backend.c
index 968438c149..649d11b4ef 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -68,6 +68,9 @@ struct BlockBackend {
 NotifierList remove_bs_notifiers, insert_bs_notifiers;
 
 int quiesce_counter;
+
+/* Number of in-flight requests. Accessed with atomic ops. */
+unsigned int in_flight;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -1109,6 +1112,16 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags 
flags)
 return bdrv_make_zero(blk->root, flags);
 }
 
+static void blk_inc_in_flight(BlockBackend *blk)
+{
+atomic_inc(>in_flight);
+}
+
+static void blk_dec_in_flight(BlockBackend *blk)
+{
+atomic_dec(>in_flight);
+}
+
 static void error_callback_bh(void *opaque)
 {
 struct BlockBackendAIOCB *acb = opaque;
@@ -1147,7 +1160,7 @@ static const AIOCBInfo blk_aio_em_aiocb_info = {
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
 {
 if (acb->has_returned) {
-bdrv_dec_in_flight(acb->common.bs);
+blk_dec_in_flight(acb->rwco.blk);
 acb->common.cb(acb->common.opaque, acb->rwco.ret);
 qemu_aio_unref(acb);
 }
@@ -1168,7 +1181,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, 
int64_t offset, int bytes,
 BlkAioEmAIOCB *acb;
 Coroutine *co;
 
-bdrv_inc_in_flight(blk_bs(blk));
+blk_inc_in_flight(blk);
 acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
 acb->rwco = (BlkRwCo) {
 .blk= blk,
@@ -1405,8 +1418,16 @@ int blk_flush(BlockBackend *blk)
 
 void blk_drain(BlockBackend *blk)
 {
-if (blk_bs(blk)) {
-bdrv_drain(blk_bs(blk));
+AioContext *ctx = blk_get_aio_context(blk);
+
+while (atomic_read(>in_flight)) {
+aio_context_acquire(ctx);
+aio_poll(ctx, false);
+aio_context_release(ctx);
+
+if (blk_bs(blk)) {
+bdrv_drain(blk_bs(blk));
+}
 }
 }
 
@@ -1453,10 +1474,11 @@ static void send_qmp_error_event(BlockBackend *blk,
  bool is_read, int error)
 {
 IoOperationType optype;
+BlockDriverState *bs = blk_bs(blk);
 
 optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
 qapi_event_send_block_io_error(blk_name(blk),
-   bdrv_get_node_name(blk_bs(blk)), optype,
+   bs ? bdrv_get_node_name(bs) : "", optype,
action, blk_iostatus_is_enabled(blk),
error == ENOSPC, strerror(error),
_abort);
diff --git a/tests/ide-test.c b/tests/ide-test.c
index bfd79ddbdc..ffbfb04271 100644
--- a/tests/ide-test.c
+++ b/tests/ide-test.c
@@ -689,6 +689,24 @@ static void test_flush_nodev(void)
 ide_test_quit();
 }
 
+static void test_flush_empty_drive(void)
+{
+QPCIDevice *dev;
+QPCIBar bmdma_bar, ide_bar;
+
+ide_test_start("-device ide-cd,bus=ide.0");
+dev = get_pci_device(_bar, _bar);
+
+/* FLUSH CACHE command on device 0*/
+qpci_io_writeb(dev, ide_bar, reg_device, 0);
+qpci_io_writeb(dev, ide_bar, reg_command, CMD_FLUSH_CACHE);
+
+/* Just testing that qemu doesn't crash... */
+
+free_pci_device(dev);
+ide_test_quit();
+}
+
 static void

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-04 Thread Paolo Bonzini

On 04/08/2017 11:58, Stefan Hajnoczi wrote:
>> the root cause of this bug is related to this as well:
>> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
>>
>> From commit 99723548 we started assuming (incorrectly?) that blk_
>> functions always WILL have an attached BDS, but this is not always true,
>> for instance, flushing the cache from an empty CDROM.
>>
>> Paolo, can we move the flight counter increment outside of the
>> block-backend layer, is that safe?
> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
> regardless of the throttling timer issue discussed below.  BB cannot
> assume that the BDS graph is non-empty.

Can we make bdrv_aio_* return NULL (even temporarily) if there is no
attached BDS?  That would make it much easier to fix.

Paolo

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-04 Thread Stefan Hajnoczi

On Mon, Jul 17, 2017 at 5:43 PM, John Snow  wrote:
> On 07/17/2017 06:26 AM, Dr. David Alan Gilbert wrote:
>> * Stefan Hajnoczi (stefa...@gmail.com) wrote:
>>> On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) 
>>> wrote:
 From: "Dr. David Alan Gilbert" 

 There's a rare exit seg if the guest is accessing
 IO during exit.
 It's always hitting the atomic_inc(>in_flight) with a NULL
 bs. This was added recently in 99723548  but I don't see it
 as the cause.

 Flip vl.c around so we pause the cpus before closing the block devices,
 that way we shouldn't have anything trying to access them when
 they're gone.

 This was originally Red Hat bz 
 https://bugzilla.redhat.com/show_bug.cgi?id=1451015

 Signed-off-by: Dr. David Alan Gilbert 
 Reported-by: Cong Li 

 --
 This is a very rare race, I'll leave it running in a loop to see if
 we hit anything else and to check this really fixes it.

 I do worry if there are other cases that can trigger this - e.g.
 hot-unplug or ejecting a CD.

 ---
  vl.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> Reviewed-by: Stefan Hajnoczi 
>>
>> Thanks;  and the test I left running seems solid - ~12k runs
>> over the weekend with no seg.
>>
>> Dave
>>
>> --
>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>>
>
> the root cause of this bug is related to this as well:
> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
>
> From commit 99723548 we started assuming (incorrectly?) that blk_
> functions always WILL have an attached BDS, but this is not always true,
> for instance, flushing the cache from an empty CDROM.
>
> Paolo, can we move the flight counter increment outside of the
> block-backend layer, is that safe?

I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
regardless of the throttling timer issue discussed below.  BB cannot
assume that the BDS graph is non-empty.

Stefan

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-04 Thread Stefan Hajnoczi

On Thu, Aug 3, 2017 at 11:36 PM, Paolo Bonzini <pbonz...@redhat.com> wrote:
> - Original Message -
>> From: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
>> To: "Alberto Garcia" <be...@igalia.com>
>> Cc: qemu-devel@nongnu.org, pbonz...@redhat.com, js...@redhat.com
>> Sent: Thursday, August 3, 2017 6:45:17 PM
>> Subject: Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block 
>> devices
>>
>> * Alberto Garcia (be...@igalia.com) wrote:
>> > On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git)
>> > wrote:
>> > > --- a/vl.c
>> > > +++ b/vl.c
>> > > @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp)
>> > >  replay_disable_events();
>> > >  iothread_stop_all();
>> > >
>> > > -bdrv_close_all();
>> > >  pause_all_vcpus();
>> > > +bdrv_close_all();
>> > >  res_free();
>> >
>> > I haven't debugged it yet, but in my computer iotest 093 stops working
>> > (it never finishes) after this change.
>>
>> Yes, I can reproduce that here (I've got to explicitly run 093 - it
>> doesn't do it automatically for me):
>
> The culprit so to speak is this:
>
> if (qtest_enabled()) {
> /* For testing block IO throttling only */
> tg->clock_type = QEMU_CLOCK_VIRTUAL;
> }
>
> So after pause_all_vcpus(), the clock doesn't advance and bdrv_close_all
> hangs.  Should throttling be disabled by the time bdrv_close drains the
> BlockDriverState, and likewise for bdrv_close_all?

bdrv_drain_all() is called before blk_remove_all_bs(), so throttling
is still enabled while draining:

void bdrv_close_all(void)
{
block_job_cancel_sync_all();
nbd_export_close_all();

/* Drop references from requests still in flight, such as canceled block
 * jobs whose AIO context has not been polled yet */
bdrv_drain_all();

blk_remove_all_bs();

Perhaps we need a bdrv_throttle_disable_all() function.  Manos is
moving throttling to a block driver so there is no longer a 1:1
relationship between BB and throttling - there might be multiple
throtting nodes in a BDS graph.

Stefan

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-03 Thread Paolo Bonzini



- Original Message -
> From: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
> To: "Alberto Garcia" <be...@igalia.com>
> Cc: qemu-devel@nongnu.org, pbonz...@redhat.com, js...@redhat.com
> Sent: Thursday, August 3, 2017 6:45:17 PM
> Subject: Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block 
> devices
> 
> * Alberto Garcia (be...@igalia.com) wrote:
> > On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git)
> > wrote:
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp)
> > >  replay_disable_events();
> > >  iothread_stop_all();
> > >  
> > > -bdrv_close_all();
> > >  pause_all_vcpus();
> > > +bdrv_close_all();
> > >  res_free();
> > 
> > I haven't debugged it yet, but in my computer iotest 093 stops working
> > (it never finishes) after this change.
> 
> Yes, I can reproduce that here (I've got to explicitly run 093 - it
> doesn't do it automatically for me):

The culprit so to speak is this:

if (qtest_enabled()) {
/* For testing block IO throttling only */
tg->clock_type = QEMU_CLOCK_VIRTUAL;
}

So after pause_all_vcpus(), the clock doesn't advance and bdrv_close_all
hangs.  Should throttling be disabled by the time bdrv_close drains the
BlockDriverState, and likewise for bdrv_close_all?

Paolo

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-03 Thread Dr. David Alan Gilbert

* Alberto Garcia (be...@igalia.com) wrote:
> On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote:
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp)
> >  replay_disable_events();
> >  iothread_stop_all();
> >  
> > -bdrv_close_all();
> >  pause_all_vcpus();
> > +bdrv_close_all();
> >  res_free();
> 
> I haven't debugged it yet, but in my computer iotest 093 stops working
> (it never finishes) after this change.

Yes, I can reproduce that here (I've got to explicitly run 093 - it
doesn't do it automatically for me):

(gdb) where
#0  0x7feee3780b76 in ppoll () at /lib64/libc.so.6
#1  0x564aed293199 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=)
at /usr/include/bits/poll2.h:77
#2  0x564aed293199 in qemu_poll_ns (fds=, nfds=, timeout=)
at /home/dgilbert/git/qemu/util/qemu-timer.c:322
#3  0x564aed294de1 in aio_poll (ctx=ctx@entry=0x564aefa578e0, 
blocking=)
at /home/dgilbert/git/qemu/util/aio-posix.c:622
#4  0x564aed215924 in bdrv_drain_recurse (bs=bs@entry=0x564aefa6c320) at 
/home/dgilbert/git/qemu/block/io.c:188
#5  0x564aed216345 in bdrv_drain_all_begin () at 
/home/dgilbert/git/qemu/block/io.c:353
#6  0x564aed2164b9 in bdrv_drain_all () at 
/home/dgilbert/git/qemu/block/io.c:382
#7  0x564aed1c9b63 in bdrv_close_all () at 
/home/dgilbert/git/qemu/block.c:3113
#8  0x564aeceb04fe in main (argc=, argv=, 
envp=)
at /home/dgilbert/git/qemu/vl.c:4795

Dave

> Berto
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-08-02 Thread Alberto Garcia

On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote:
> --- a/vl.c
> +++ b/vl.c
> @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp)
>  replay_disable_events();
>  iothread_stop_all();
>  
> -bdrv_close_all();
>  pause_all_vcpus();
> +bdrv_close_all();
>  res_free();

I haven't debugged it yet, but in my computer iotest 093 stops working
(it never finishes) after this change.

Berto

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-07-17 Thread John Snow



On 07/17/2017 06:26 AM, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefa...@gmail.com) wrote:
>> On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote:
>>> From: "Dr. David Alan Gilbert" 
>>>
>>> There's a rare exit seg if the guest is accessing
>>> IO during exit.
>>> It's always hitting the atomic_inc(>in_flight) with a NULL
>>> bs. This was added recently in 99723548  but I don't see it
>>> as the cause.
>>>
>>> Flip vl.c around so we pause the cpus before closing the block devices,
>>> that way we shouldn't have anything trying to access them when
>>> they're gone.
>>>
>>> This was originally Red Hat bz 
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1451015
>>>
>>> Signed-off-by: Dr. David Alan Gilbert 
>>> Reported-by: Cong Li 
>>>
>>> --
>>> This is a very rare race, I'll leave it running in a loop to see if
>>> we hit anything else and to check this really fixes it.
>>>
>>> I do worry if there are other cases that can trigger this - e.g.
>>> hot-unplug or ejecting a CD.
>>>
>>> ---
>>>  vl.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> Reviewed-by: Stefan Hajnoczi 
> 
> Thanks;  and the test I left running seems solid - ~12k runs
> over the weekend with no seg.
> 
> Dave
> 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 

the root cause of this bug is related to this as well:
https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html

>From commit 99723548 we started assuming (incorrectly?) that blk_
functions always WILL have an attached BDS, but this is not always true,
for instance, flushing the cache from an empty CDROM.

Paolo, can we move the flight counter increment outside of the
block-backend layer, is that safe?

--js

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-07-17 Thread Dr. David Alan Gilbert

* Stefan Hajnoczi (stefa...@gmail.com) wrote:
> On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > There's a rare exit seg if the guest is accessing
> > IO during exit.
> > It's always hitting the atomic_inc(>in_flight) with a NULL
> > bs. This was added recently in 99723548  but I don't see it
> > as the cause.
> > 
> > Flip vl.c around so we pause the cpus before closing the block devices,
> > that way we shouldn't have anything trying to access them when
> > they're gone.
> > 
> > This was originally Red Hat bz 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1451015
> > 
> > Signed-off-by: Dr. David Alan Gilbert 
> > Reported-by: Cong Li 
> > 
> > --
> > This is a very rare race, I'll leave it running in a loop to see if
> > we hit anything else and to check this really fixes it.
> > 
> > I do worry if there are other cases that can trigger this - e.g.
> > hot-unplug or ejecting a CD.
> > 
> > ---
> >  vl.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Reviewed-by: Stefan Hajnoczi 

Thanks;  and the test I left running seems solid - ~12k runs
over the weekend with no seg.

Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

2017-07-17 Thread Stefan Hajnoczi

On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> There's a rare exit seg if the guest is accessing
> IO during exit.
> It's always hitting the atomic_inc(>in_flight) with a NULL
> bs. This was added recently in 99723548  but I don't see it
> as the cause.
> 
> Flip vl.c around so we pause the cpus before closing the block devices,
> that way we shouldn't have anything trying to access them when
> they're gone.
> 
> This was originally Red Hat bz 
> https://bugzilla.redhat.com/show_bug.cgi?id=1451015
> 
> Signed-off-by: Dr. David Alan Gilbert 
> Reported-by: Cong Li 
> 
> --
> This is a very rare race, I'll leave it running in a loop to see if
> we hit anything else and to check this really fixes it.
> 
> I do worry if there are other cases that can trigger this - e.g.
> hot-unplug or ejecting a CD.
> 
> ---
>  vl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

16 matches

Site Navigation

Mail list logo

Footer information