Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()

2010-11-01 Thread Ryan Harper
* Markus Armbruster  [2010-10-29 09:08]:
> Ryan Harper  writes:
> 
> > Block hot unplug is racy since the guest is required to acknowlege the ACPI
> > unplug event; this may not happen synchronously with the device removal 
> > command
> >
> > This series aims to close a gap where by mgmt applications that assume the
> > block resource has been removed without confirming that the guest has
> > acknowledged the removal may re-assign the underlying device to a second 
> > guest
> > leading to data leakage.
> >
> > This series introduces a new montor command to decouple asynchornous device
> > removal from restricting guest access to a block device.  We do this by 
> > creating
> > a new monitor command drive_unplug which maps to a bdrv_unplug() command 
> > which
> > does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
> > subsequent
> > IO is rejected from the device and the guest will get IO errors but 
> > continue to
> > function.
> >
> > A subsequent device removal command can be issued to remove the device, to 
> > which
> > the guest may or maynot respond, but as long as the unplugged bit is set, 
> > no IO
> > will be sumbitted.
> >
> > Changes since v1:
> > - Added qemu_aio_flush() before bdrv_flush() to wait on pending io
> >
> > Signed-off-by: Ryan Harper 
> > ---
> >  block.c |7 +++
> >  block.h |1 +
> >  blockdev.c  |   26 ++
> >  blockdev.h  |1 +
> >  hmp-commands.hx |   15 +++
> >  5 files changed, 50 insertions(+), 0 deletions(-)
> >
> > diff --git a/block.c b/block.c
> > index a19374d..be47655 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
> > removable)
> >  }
> >  }
> >  
> > +void bdrv_unplug(BlockDriverState *bs)
> > +{
> > +qemu_aio_flush();
> > +bdrv_flush(bs);
> > +bdrv_close(bs);
> > +}
> 
> Stupid question: why doesn't bdrv_close() flush automatically?
> 
> And why do we have to flush here, but not before other uses of
> bdrv_close(), such as eject_device()?
> 
> > +
> >  int bdrv_is_removable(BlockDriverState *bs)
> >  {
> >  return bs->removable;
> > diff --git a/block.h b/block.h
> > index 5f64380..732f63e 100644
> > --- a/block.h
> > +++ b/block.h
> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
> > BlockErrorAction on_read_error,
> > BlockErrorAction on_write_error);
> >  BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
> >  void bdrv_set_removable(BlockDriverState *bs, int removable);
> > +void bdrv_unplug(BlockDriverState *bs);
> >  int bdrv_is_removable(BlockDriverState *bs);
> >  int bdrv_is_read_only(BlockDriverState *bs);
> >  int bdrv_is_sg(BlockDriverState *bs);
> > diff --git a/blockdev.c b/blockdev.c
> > index 5fc3b9b..68eb329 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -610,3 +610,29 @@ int do_change_block(Monitor *mon, const char *device,
> >  }
> >  return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
> >  }
> > +
> > +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data)
> > +{
> > +DriveInfo *dinfo;
> > +BlockDriverState *bs;
> > +const char *id;
> > +
> > +if (!qdict_haskey(qdict, "id")) {
> > +qerror_report(QERR_MISSING_PARAMETER, "id");
> > +return -1;
> > +}
> 
> As Luiz pointed out, this check is redundant.
> 
> > +
> > +id = qdict_get_str(qdict, "id");
> > +dinfo = drive_get_by_id(id);
> > +if (!dinfo) {
> > +qerror_report(QERR_DEVICE_NOT_FOUND, id);
> > +return -1;
> > +}
> > +
> > +/* mark block device unplugged */
> > +bs = dinfo->bdrv;
> > +bdrv_unplug(bs);
> > +
> > +return 0;
> > +}
> > + 
> 
> What about:
> 
> const char *id = qdict_get_str(qdict, "id");
> BlockDriverState *bs;
> 
> bs = bdrv_find(id);
> if (!bs) {
> qerror_report(QERR_DEVICE_NOT_FOUND, id);
> return -1;
> }
> 
> bdrv_unplug(bs);
> 
> return 0;
> 
> Precedence: commit f8b6cc00 replaced uses of drive_get_by_id() by
> bdrv_find().

That works out nicely; and I can drop the drive_get_by_id() patch as
well.  Thanks.

> 
> > diff --git a/blockdev.h b/blockdev.h
> > index 19c6915..ecb9ac8 100644
> > --- a/blockdev.h
> > +++ b/blockdev.h
> > @@ -52,5 +52,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
> > **ret_data);
> >  int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject 
> > **ret_data);
> >  int do_change_block(Monitor *mon, const char *device,
> >  const char *filename, const char *fmt);
> > +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data);
> >  
> >  #endif
> > diff --git a/hmp-commands.hx b/hmp-commands.hx
> > index 81999aa..7a32a2e 100644
> > --- a/hmp-commands.hx
> > +++ b/hmp-commands.hx
> > @@ -68,6 +68,21 @@ Eject a removable medium (use -f to force it).
> >  ETEXI
> >  
> >  {
> > +

Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()

2010-10-30 Thread Christoph Hellwig
On Fri, Oct 29, 2010 at 06:08:03PM +0200, Kevin Wolf wrote:
> > I think we've got a bit of a problem.
> > 
> > We have:
> > 
> > 1) bdrv_flush() - sends an fdatasync
> > 
> > 2) bdrv_aio_flush() - sends an fdatasync using the thread pool
> > 
> > 3) qemu_aio_flush() - waits for all pending aio requests to complete
> > 
> > But we use bdrv_aio_flush() to implement a barrier and we don't actually 
> > preserve those barrier semantics in the thread pool.
> 
> Not really. We use it to implement flush commands, which I think don't
> necessarily constitute a barrier by themselves.

Yes.  Just as with normal disks qemu has absolutely no concept of I/O
barriers.  I/O barriers is an abstraction inside the Linux kernel that
we fortunately finally got rid of.

Qemu just gets a cache flush command from the guest and executes it.
Usuaully asynchronously as synchronous block I/O with a single
outstanding request is not very performant.  The filesystem in the guest
handles the ordering around it.

> bdrv_aio_flush, as I understand it, is meant to flush only completed
> writes.

Exactly.  The guest OS tracks writes and only issues a cache flush if
all I/Os it wants to see flushes have been ACKed by the storage hardware
/ qemu.




Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()

2010-10-29 Thread Kevin Wolf
Am 29.10.2010 17:28, schrieb Anthony Liguori:
> On 10/29/2010 09:57 AM, Kevin Wolf wrote:
>> Am 29.10.2010 16:40, schrieb Anthony Liguori:
>>
>>> On 10/29/2010 09:29 AM, Kevin Wolf wrote:
>>>  
 Am 29.10.2010 16:15, schrieb Anthony Liguori:

> I don't think it's a bad idea to do that but to the extent that the
> block API is designed after posix file I/O, close does not usually imply
> flush.
>
>  
 I don't think it really resembles POSIX. More or less the only thing
 they have in common is that both provide open, read, write and close,
 which is something that probably any API for file accesses provides.

 The operation you're talking about here is bdrv_flush/fsync that is not
 implied by a POSIX close?


>>> Yes.  But I think for the purposes of this patch, a bdrv_cancel_all()
>>> would be just as good.  The intention is to eliminate pending I/O
>>> requests, the fsync is just a side effect.
>>>  
>> Well, if I'm not mistaken, bdrv_flush would provide only this side
>> effect and not the semantics that you're really looking for. This is why
>> I suggested adding both bdrv_flush and qemu_aio_flush. We could probably
>> introduce a qemu_aio_flush variant that flushes only one
>> BlockDriverState - this is what you really want.
>>
>>
>> And why do we have to flush here, but not before other uses of
>> bdrv_close(), such as eject_device()?
>>
>>
>>
> Good question.  Kevin should also confirm, but looking at the code, I
> think flush() is needed before close.  If there's a pending I/O event
> and you close before the I/O event is completed, you'll get a callback
> for completion against a bogus BlockDriverState.
>
> I can't find anything in either raw-posix or the generic block layer
> that would mitigate this.
>
>  
 I'm not aware of anything either. This is what qemu_aio_flush would do.

 It seems reasonable to me to call both qemu_aio_flush and bdrv_flush in
 bdrv_close. We probably don't really need to call bdrv_flush to operate
 correctly, but it can't hurt and bdrv_close shouldn't happen that often
 anyway.


>>> I agree.  Re: qemu_aio_flush, we have to wait for it to complete which
>>> gets a little complicated in bdrv_close().
>>>  
>> qemu_aio_flush is the function that waits for requests to complete.
>>
> 
> Please excuse me while my head explodes ;-)
> 
> I think we've got a bit of a problem.
> 
> We have:
> 
> 1) bdrv_flush() - sends an fdatasync
> 
> 2) bdrv_aio_flush() - sends an fdatasync using the thread pool
> 
> 3) qemu_aio_flush() - waits for all pending aio requests to complete
> 
> But we use bdrv_aio_flush() to implement a barrier and we don't actually 
> preserve those barrier semantics in the thread pool.

Not really. We use it to implement flush commands, which I think don't
necessarily constitute a barrier by themselves.

> That is:
> 
> If I do:
> 
> bdrv_aio_write() -> A
> bdrv_aio_write() -> B
> bdrv_aio_flush() -> C
> 
> This will get queued as three requests on the thread pool.  (A) is a 
> write, (B) is a write, and (C) is a fdatasync.
> 
> But if this gets picked up by three separate threads, the ordering isn't 
> guaranteed.  It might be C, B, A.  So semantically, is bdrv_aio_flush() 
> supposed to flush any *pending* writes or any *completed* writes?  If 
> it's the later, we're okay, but if it's the former, we're broken.

Right, so don't do that. ;-)

bdrv_aio_flush, as I understand it, is meant to flush only completed
writes. We've had this discussion before and if I understood right, this
is also how real hardware works generally. So to get barrier semantics
you as an OS need to flush your queue, i.e. you wait for A and B to
complete before you issue C.

Christoph should be able to detail on this.

Kevin



Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()

2010-10-29 Thread Markus Armbruster
Anthony Liguori  writes:

> On 10/29/2010 09:01 AM, Markus Armbruster wrote:
>> Ryan Harper  writes:
>>
>>
>>> Block hot unplug is racy since the guest is required to acknowlege the ACPI
>>> unplug event; this may not happen synchronously with the device removal 
>>> command
>>>
>>> This series aims to close a gap where by mgmt applications that assume the
>>> block resource has been removed without confirming that the guest has
>>> acknowledged the removal may re-assign the underlying device to a second 
>>> guest
>>> leading to data leakage.
>>>
>>> This series introduces a new montor command to decouple asynchornous device
>>> removal from restricting guest access to a block device.  We do this by 
>>> creating
>>> a new monitor command drive_unplug which maps to a bdrv_unplug() command 
>>> which
>>> does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
>>> subsequent
>>> IO is rejected from the device and the guest will get IO errors but 
>>> continue to
>>> function.
>>>
>>> A subsequent device removal command can be issued to remove the device, to 
>>> which
>>> the guest may or maynot respond, but as long as the unplugged bit is set, 
>>> no IO
>>> will be sumbitted.
>>>
>>> Changes since v1:
>>> - Added qemu_aio_flush() before bdrv_flush() to wait on pending io
>>>
>>> Signed-off-by: Ryan Harper
>>> ---
>>>   block.c |7 +++
>>>   block.h |1 +
>>>   blockdev.c  |   26 ++
>>>   blockdev.h  |1 +
>>>   hmp-commands.hx |   15 +++
>>>   5 files changed, 50 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index a19374d..be47655 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
>>> removable)
>>>   }
>>>   }
>>>
>>> +void bdrv_unplug(BlockDriverState *bs)
>>> +{
>>> +qemu_aio_flush();
>>> +bdrv_flush(bs);
>>> +bdrv_close(bs);
>>> +}
>>>  
>> Stupid question: why doesn't bdrv_close() flush automatically?
>>
>
> I don't think it's a bad idea to do that but to the extent that the
> block API is designed after posix file I/O, close does not usually
> imply flush.

There is no flush() in POSIX file I/O.  There is fsync().

There is fflush() in stdio.  fclose() flushes automatically.  Flushing
only affects stdio buffers, it doesn't imply fsync().

Based on that, a reasonable programmer could be led to believe that
bdrv_close() flushes automatically, and flushing doesn't fsync().

>> And why do we have to flush here, but not before other uses of
>> bdrv_close(), such as eject_device()?
>>
>
> Good question.  Kevin should also confirm, but looking at the code, I
> think flush() is needed before close.  If there's a pending I/O event
> and you close before the I/O event is completed, you'll get a callback
> for completion against a bogus BlockDriverState.
>
> I can't find anything in either raw-posix or the generic block layer
> that would mitigate this.

Then bdrv_close() is too hard to use.



Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()

2010-10-29 Thread Anthony Liguori

On 10/29/2010 09:57 AM, Kevin Wolf wrote:

Am 29.10.2010 16:40, schrieb Anthony Liguori:
   

On 10/29/2010 09:29 AM, Kevin Wolf wrote:
 

Am 29.10.2010 16:15, schrieb Anthony Liguori:
   

I don't think it's a bad idea to do that but to the extent that the
block API is designed after posix file I/O, close does not usually imply
flush.

 

I don't think it really resembles POSIX. More or less the only thing
they have in common is that both provide open, read, write and close,
which is something that probably any API for file accesses provides.

The operation you're talking about here is bdrv_flush/fsync that is not
implied by a POSIX close?

   

Yes.  But I think for the purposes of this patch, a bdrv_cancel_all()
would be just as good.  The intention is to eliminate pending I/O
requests, the fsync is just a side effect.
 

Well, if I'm not mistaken, bdrv_flush would provide only this side
effect and not the semantics that you're really looking for. This is why
I suggested adding both bdrv_flush and qemu_aio_flush. We could probably
introduce a qemu_aio_flush variant that flushes only one
BlockDriverState - this is what you really want.

   

And why do we have to flush here, but not before other uses of
bdrv_close(), such as eject_device()?


   

Good question.  Kevin should also confirm, but looking at the code, I
think flush() is needed before close.  If there's a pending I/O event
and you close before the I/O event is completed, you'll get a callback
for completion against a bogus BlockDriverState.

I can't find anything in either raw-posix or the generic block layer
that would mitigate this.

 

I'm not aware of anything either. This is what qemu_aio_flush would do.

It seems reasonable to me to call both qemu_aio_flush and bdrv_flush in
bdrv_close. We probably don't really need to call bdrv_flush to operate
correctly, but it can't hurt and bdrv_close shouldn't happen that often
anyway.

   

I agree.  Re: qemu_aio_flush, we have to wait for it to complete which
gets a little complicated in bdrv_close().
 

qemu_aio_flush is the function that waits for requests to complete.
   


Please excuse me while my head explodes ;-)

I think we've got a bit of a problem.

We have:

1) bdrv_flush() - sends an fdatasync

2) bdrv_aio_flush() - sends an fdatasync using the thread pool

3) qemu_aio_flush() - waits for all pending aio requests to complete

But we use bdrv_aio_flush() to implement a barrier and we don't actually 
preserve those barrier semantics in the thread pool.


That is:

If I do:

bdrv_aio_write() -> A
bdrv_aio_write() -> B
bdrv_aio_flush() -> C

This will get queued as three requests on the thread pool.  (A) is a 
write, (B) is a write, and (C) is a fdatasync.


But if this gets picked up by three separate threads, the ordering isn't 
guaranteed.  It might be C, B, A.  So semantically, is bdrv_aio_flush() 
supposed to flush any *pending* writes or any *completed* writes?  If 
it's the later, we're okay, but if it's the former, we're broken.


If it's supposed to flush any pending writes, then my patch series is 
correct in theory.


Regards,

Anthony Liguori


I think it would be better
to make bdrv_flush() call bdrv_aio_flush() if an explicit bdrv_flush
method isn't provided.  Something like the attached (still need to test).

Does that seem reasonable?
 

I'm not sure why you want to introduce this emulation. Are there any
drivers that implement bdrv_aio_flush, but not bdrv_flush? They are
definitely broken.

Today, bdrv_aio_flush is emulated using bdrv_flush if the driver doesn't
provide it explicitly.

I think this also means that your first patch would kill any drivers
implementing neither bdrv_flush nor bdrv_aio_flush because they'd try to
emulate the other function in an endless recursion.

And apart from that, as said above, bdrv_flush doesn't do the right
thing anyway. ;-)

Kevin
   





Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()

2010-10-29 Thread Kevin Wolf
Am 29.10.2010 16:40, schrieb Anthony Liguori:
> On 10/29/2010 09:29 AM, Kevin Wolf wrote:
>> Am 29.10.2010 16:15, schrieb Anthony Liguori:
>>> I don't think it's a bad idea to do that but to the extent that the
>>> block API is designed after posix file I/O, close does not usually imply
>>> flush.
>>>  
>> I don't think it really resembles POSIX. More or less the only thing
>> they have in common is that both provide open, read, write and close,
>> which is something that probably any API for file accesses provides.
>>
>> The operation you're talking about here is bdrv_flush/fsync that is not
>> implied by a POSIX close?
>>
> 
> Yes.  But I think for the purposes of this patch, a bdrv_cancel_all() 
> would be just as good.  The intention is to eliminate pending I/O 
> requests, the fsync is just a side effect.

Well, if I'm not mistaken, bdrv_flush would provide only this side
effect and not the semantics that you're really looking for. This is why
I suggested adding both bdrv_flush and qemu_aio_flush. We could probably
introduce a qemu_aio_flush variant that flushes only one
BlockDriverState - this is what you really want.

 And why do we have to flush here, but not before other uses of
 bdrv_close(), such as eject_device()?


>>> Good question.  Kevin should also confirm, but looking at the code, I
>>> think flush() is needed before close.  If there's a pending I/O event
>>> and you close before the I/O event is completed, you'll get a callback
>>> for completion against a bogus BlockDriverState.
>>>
>>> I can't find anything in either raw-posix or the generic block layer
>>> that would mitigate this.
>>>  
>> I'm not aware of anything either. This is what qemu_aio_flush would do.
>>
>> It seems reasonable to me to call both qemu_aio_flush and bdrv_flush in
>> bdrv_close. We probably don't really need to call bdrv_flush to operate
>> correctly, but it can't hurt and bdrv_close shouldn't happen that often
>> anyway.
>>
> 
> I agree.  Re: qemu_aio_flush, we have to wait for it to complete which 
> gets a little complicated in bdrv_close().  

qemu_aio_flush is the function that waits for requests to complete.

> I think it would be better 
> to make bdrv_flush() call bdrv_aio_flush() if an explicit bdrv_flush 
> method isn't provided.  Something like the attached (still need to test).
> 
> Does that seem reasonable?

I'm not sure why you want to introduce this emulation. Are there any
drivers that implement bdrv_aio_flush, but not bdrv_flush? They are
definitely broken.

Today, bdrv_aio_flush is emulated using bdrv_flush if the driver doesn't
provide it explicitly.

I think this also means that your first patch would kill any drivers
implementing neither bdrv_flush nor bdrv_aio_flush because they'd try to
emulate the other function in an endless recursion.

And apart from that, as said above, bdrv_flush doesn't do the right
thing anyway. ;-)

Kevin



Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()

2010-10-29 Thread Anthony Liguori

On 10/29/2010 09:29 AM, Kevin Wolf wrote:

Am 29.10.2010 16:15, schrieb Anthony Liguori:
   

On 10/29/2010 09:01 AM, Markus Armbruster wrote:
 

Ryan Harper   writes:
   

diff --git a/block.c b/block.c
index a19374d..be47655 100644
--- a/block.c
+++ b/block.c
@@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
removable)
   }
   }

+void bdrv_unplug(BlockDriverState *bs)
+{
+qemu_aio_flush();
+bdrv_flush(bs);
+bdrv_close(bs);
+}

 

Stupid question: why doesn't bdrv_close() flush automatically?

   

I don't think it's a bad idea to do that but to the extent that the
block API is designed after posix file I/O, close does not usually imply
flush.
 

I don't think it really resembles POSIX. More or less the only thing
they have in common is that both provide open, read, write and close,
which is something that probably any API for file accesses provides.

The operation you're talking about here is bdrv_flush/fsync that is not
implied by a POSIX close?
   


Yes.  But I think for the purposes of this patch, a bdrv_cancel_all() 
would be just as good.  The intention is to eliminate pending I/O 
requests, the fsync is just a side effect.



And why do we have to flush here, but not before other uses of
bdrv_close(), such as eject_device()?

   

Good question.  Kevin should also confirm, but looking at the code, I
think flush() is needed before close.  If there's a pending I/O event
and you close before the I/O event is completed, you'll get a callback
for completion against a bogus BlockDriverState.

I can't find anything in either raw-posix or the generic block layer
that would mitigate this.
 

I'm not aware of anything either. This is what qemu_aio_flush would do.

It seems reasonable to me to call both qemu_aio_flush and bdrv_flush in
bdrv_close. We probably don't really need to call bdrv_flush to operate
correctly, but it can't hurt and bdrv_close shouldn't happen that often
anyway.
   


I agree.  Re: qemu_aio_flush, we have to wait for it to complete which 
gets a little complicated in bdrv_close().  I think it would be better 
to make bdrv_flush() call bdrv_aio_flush() if an explicit bdrv_flush 
method isn't provided.  Something like the attached (still need to test).


Does that seem reasonable?

Regards,

Anthony Liguori


Kevin
   


>From 86bf3c9eb5ce43280224f9271a4ad016b0dd3fb1 Mon Sep 17 00:00:00 2001
From: Anthony Liguori 
Date: Fri, 29 Oct 2010 09:36:53 -0500
Subject: [PATCH 1/2] block: make bdrv_flush() fall back to bdrv_aio_flush

Signed-off-by: Anthony Liguori 

diff --git a/block.c b/block.c
index 985d0b7..fc8defd 100644
--- a/block.c
+++ b/block.c
@@ -1453,14 +1453,51 @@ const char *bdrv_get_device_name(BlockDriverState *bs)
 return bs->device_name;
 }
 
+static void bdrv_flush_em_cb(void *opaque, int ret)
+{
+int *pcomplete = opaque;
+*pcomplete = 1;
+}
+
+static void bdrv_flush_em(BlockDriverState *bs)
+{
+int complete = 0;
+BlockDriverAIOCB *acb;
+
+if (!bs->drv->bdrv_aio_flush) {
+return;
+}
+
+async_context_push();
+
+acb = bs->drv->bdrv_aio_flush(bs, bdrv_flush_em_cb, &complete);
+if (!acb) {
+goto out;
+}
+
+while (!complete) {
+qemu_aio_wait();
+}
+
+out:
+async_context_pop();
+}
+
 void bdrv_flush(BlockDriverState *bs)
 {
 if (bs->open_flags & BDRV_O_NO_FLUSH) {
 return;
 }
 
-if (bs->drv && bs->drv->bdrv_flush)
+if (!bs->drv) {
+return;
+}
+
+if (bs->drv->bdrv_flush) {
 bs->drv->bdrv_flush(bs);
+} else {
+bdrv_flush_em(bs);
+}
 }
 
 void bdrv_flush_all(void)
-- 
1.7.0.4

>From 094049974796ddf78ee2f1541bffa40fe1176a1a Mon Sep 17 00:00:00 2001
From: Anthony Liguori 
Date: Fri, 29 Oct 2010 09:37:25 -0500
Subject: [PATCH 2/2] block: add bdrv_flush to bdrv_close

To ensure that there are no pending completions before destroying a block
device.

Signed-off-by: Anthony Liguori 

diff --git a/block.c b/block.c
index fc8defd..d2aed1b 100644
--- a/block.c
+++ b/block.c
@@ -644,6 +644,8 @@ unlink_and_fail:
 void bdrv_close(BlockDriverState *bs)
 {
 if (bs->drv) {
+bdrv_flush(bs);
+
 if (bs == bs_snapshots) {
 bs_snapshots = NULL;
 }
-- 
1.7.0.4



Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()

2010-10-29 Thread Kevin Wolf
Am 29.10.2010 16:15, schrieb Anthony Liguori:
> On 10/29/2010 09:01 AM, Markus Armbruster wrote:
>> Ryan Harper  writes:
>>> diff --git a/block.c b/block.c
>>> index a19374d..be47655 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
>>> removable)
>>>   }
>>>   }
>>>
>>> +void bdrv_unplug(BlockDriverState *bs)
>>> +{
>>> +qemu_aio_flush();
>>> +bdrv_flush(bs);
>>> +bdrv_close(bs);
>>> +}
>>>  
>> Stupid question: why doesn't bdrv_close() flush automatically?
>>
> 
> I don't think it's a bad idea to do that but to the extent that the 
> block API is designed after posix file I/O, close does not usually imply 
> flush.

I don't think it really resembles POSIX. More or less the only thing
they have in common is that both provide open, read, write and close,
which is something that probably any API for file accesses provides.

The operation you're talking about here is bdrv_flush/fsync that is not
implied by a POSIX close?

>> And why do we have to flush here, but not before other uses of
>> bdrv_close(), such as eject_device()?
>>
> 
> Good question.  Kevin should also confirm, but looking at the code, I 
> think flush() is needed before close.  If there's a pending I/O event 
> and you close before the I/O event is completed, you'll get a callback 
> for completion against a bogus BlockDriverState.
> 
> I can't find anything in either raw-posix or the generic block layer 
> that would mitigate this.

I'm not aware of anything either. This is what qemu_aio_flush would do.

It seems reasonable to me to call both qemu_aio_flush and bdrv_flush in
bdrv_close. We probably don't really need to call bdrv_flush to operate
correctly, but it can't hurt and bdrv_close shouldn't happen that often
anyway.

Kevin



Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()

2010-10-29 Thread Anthony Liguori

On 10/29/2010 09:01 AM, Markus Armbruster wrote:

Ryan Harper  writes:

   

Block hot unplug is racy since the guest is required to acknowlege the ACPI
unplug event; this may not happen synchronously with the device removal command

This series aims to close a gap where by mgmt applications that assume the
block resource has been removed without confirming that the guest has
acknowledged the removal may re-assign the underlying device to a second guest
leading to data leakage.

This series introduces a new montor command to decouple asynchornous device
removal from restricting guest access to a block device.  We do this by creating
a new monitor command drive_unplug which maps to a bdrv_unplug() command which
does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
IO is rejected from the device and the guest will get IO errors but continue to
function.

A subsequent device removal command can be issued to remove the device, to which
the guest may or maynot respond, but as long as the unplugged bit is set, no IO
will be sumbitted.

Changes since v1:
- Added qemu_aio_flush() before bdrv_flush() to wait on pending io

Signed-off-by: Ryan Harper
---
  block.c |7 +++
  block.h |1 +
  blockdev.c  |   26 ++
  blockdev.h  |1 +
  hmp-commands.hx |   15 +++
  5 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index a19374d..be47655 100644
--- a/block.c
+++ b/block.c
@@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
removable)
  }
  }

+void bdrv_unplug(BlockDriverState *bs)
+{
+qemu_aio_flush();
+bdrv_flush(bs);
+bdrv_close(bs);
+}
 

Stupid question: why doesn't bdrv_close() flush automatically?
   


I don't think it's a bad idea to do that but to the extent that the 
block API is designed after posix file I/O, close does not usually imply 
flush.



And why do we have to flush here, but not before other uses of
bdrv_close(), such as eject_device()?
   


Good question.  Kevin should also confirm, but looking at the code, I 
think flush() is needed before close.  If there's a pending I/O event 
and you close before the I/O event is completed, you'll get a callback 
for completion against a bogus BlockDriverState.


I can't find anything in either raw-posix or the generic block layer 
that would mitigate this.


Regards,

Anthony Liguori



Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()

2010-10-29 Thread Markus Armbruster
Ryan Harper  writes:

> Block hot unplug is racy since the guest is required to acknowlege the ACPI
> unplug event; this may not happen synchronously with the device removal 
> command
>
> This series aims to close a gap where by mgmt applications that assume the
> block resource has been removed without confirming that the guest has
> acknowledged the removal may re-assign the underlying device to a second guest
> leading to data leakage.
>
> This series introduces a new montor command to decouple asynchornous device
> removal from restricting guest access to a block device.  We do this by 
> creating
> a new monitor command drive_unplug which maps to a bdrv_unplug() command which
> does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
> subsequent
> IO is rejected from the device and the guest will get IO errors but continue 
> to
> function.
>
> A subsequent device removal command can be issued to remove the device, to 
> which
> the guest may or maynot respond, but as long as the unplugged bit is set, no 
> IO
> will be sumbitted.
>
> Changes since v1:
> - Added qemu_aio_flush() before bdrv_flush() to wait on pending io
>
> Signed-off-by: Ryan Harper 
> ---
>  block.c |7 +++
>  block.h |1 +
>  blockdev.c  |   26 ++
>  blockdev.h  |1 +
>  hmp-commands.hx |   15 +++
>  5 files changed, 50 insertions(+), 0 deletions(-)
>
> diff --git a/block.c b/block.c
> index a19374d..be47655 100644
> --- a/block.c
> +++ b/block.c
> @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
> removable)
>  }
>  }
>  
> +void bdrv_unplug(BlockDriverState *bs)
> +{
> +qemu_aio_flush();
> +bdrv_flush(bs);
> +bdrv_close(bs);
> +}

Stupid question: why doesn't bdrv_close() flush automatically?

And why do we have to flush here, but not before other uses of
bdrv_close(), such as eject_device()?

> +
>  int bdrv_is_removable(BlockDriverState *bs)
>  {
>  return bs->removable;
> diff --git a/block.h b/block.h
> index 5f64380..732f63e 100644
> --- a/block.h
> +++ b/block.h
> @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
> BlockErrorAction on_read_error,
> BlockErrorAction on_write_error);
>  BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
>  void bdrv_set_removable(BlockDriverState *bs, int removable);
> +void bdrv_unplug(BlockDriverState *bs);
>  int bdrv_is_removable(BlockDriverState *bs);
>  int bdrv_is_read_only(BlockDriverState *bs);
>  int bdrv_is_sg(BlockDriverState *bs);
> diff --git a/blockdev.c b/blockdev.c
> index 5fc3b9b..68eb329 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -610,3 +610,29 @@ int do_change_block(Monitor *mon, const char *device,
>  }
>  return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
>  }
> +
> +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> +DriveInfo *dinfo;
> +BlockDriverState *bs;
> +const char *id;
> +
> +if (!qdict_haskey(qdict, "id")) {
> +qerror_report(QERR_MISSING_PARAMETER, "id");
> +return -1;
> +}

As Luiz pointed out, this check is redundant.

> +
> +id = qdict_get_str(qdict, "id");
> +dinfo = drive_get_by_id(id);
> +if (!dinfo) {
> +qerror_report(QERR_DEVICE_NOT_FOUND, id);
> +return -1;
> +}
> +
> +/* mark block device unplugged */
> +bs = dinfo->bdrv;
> +bdrv_unplug(bs);
> +
> +return 0;
> +}
> + 

What about:

const char *id = qdict_get_str(qdict, "id");
BlockDriverState *bs;

bs = bdrv_find(id);
if (!bs) {
qerror_report(QERR_DEVICE_NOT_FOUND, id);
return -1;
}

bdrv_unplug(bs);

return 0;

Precedence: commit f8b6cc00 replaced uses of drive_get_by_id() by
bdrv_find().

> diff --git a/blockdev.h b/blockdev.h
> index 19c6915..ecb9ac8 100644
> --- a/blockdev.h
> +++ b/blockdev.h
> @@ -52,5 +52,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
> **ret_data);
>  int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject 
> **ret_data);
>  int do_change_block(Monitor *mon, const char *device,
>  const char *filename, const char *fmt);
> +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data);
>  
>  #endif
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 81999aa..7a32a2e 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -68,6 +68,21 @@ Eject a removable medium (use -f to force it).
>  ETEXI
>  
>  {
> +.name   = "drive_unplug",
> +.args_type  = "id:s",
> +.params = "device",
> +.help   = "unplug block device",
> +.user_print = monitor_user_noop,
> +.mhandler.cmd_new = do_drive_unplug,
> +},
> +
> +STEXI
> +...@item unplug @var{device}
> +...@findex unplug
> +Unplug block device.

A bit terse, isn't it?  What does it mean to unplug a block device?
What's its observable effect on 

[Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()

2010-10-25 Thread Ryan Harper
Block hot unplug is racy since the guest is required to acknowlege the ACPI
unplug event; this may not happen synchronously with the device removal command

This series aims to close a gap where by mgmt applications that assume the
block resource has been removed without confirming that the guest has
acknowledged the removal may re-assign the underlying device to a second guest
leading to data leakage.

This series introduces a new montor command to decouple asynchornous device
removal from restricting guest access to a block device.  We do this by creating
a new monitor command drive_unplug which maps to a bdrv_unplug() command which
does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
IO is rejected from the device and the guest will get IO errors but continue to
function.

A subsequent device removal command can be issued to remove the device, to which
the guest may or maynot respond, but as long as the unplugged bit is set, no IO
will be sumbitted.

Changes since v1:
- Added qemu_aio_flush() before bdrv_flush() to wait on pending io

Signed-off-by: Ryan Harper 
---
 block.c |7 +++
 block.h |1 +
 blockdev.c  |   26 ++
 blockdev.h  |1 +
 hmp-commands.hx |   15 +++
 5 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index a19374d..be47655 100644
--- a/block.c
+++ b/block.c
@@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
removable)
 }
 }
 
+void bdrv_unplug(BlockDriverState *bs)
+{
+qemu_aio_flush();
+bdrv_flush(bs);
+bdrv_close(bs);
+}
+
 int bdrv_is_removable(BlockDriverState *bs)
 {
 return bs->removable;
diff --git a/block.h b/block.h
index 5f64380..732f63e 100644
--- a/block.h
+++ b/block.h
@@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
BlockErrorAction on_read_error,
BlockErrorAction on_write_error);
 BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
 void bdrv_set_removable(BlockDriverState *bs, int removable);
+void bdrv_unplug(BlockDriverState *bs);
 int bdrv_is_removable(BlockDriverState *bs);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index 5fc3b9b..68eb329 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -610,3 +610,29 @@ int do_change_block(Monitor *mon, const char *device,
 }
 return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
+
+int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+DriveInfo *dinfo;
+BlockDriverState *bs;
+const char *id;
+
+if (!qdict_haskey(qdict, "id")) {
+qerror_report(QERR_MISSING_PARAMETER, "id");
+return -1;
+}
+
+id = qdict_get_str(qdict, "id");
+dinfo = drive_get_by_id(id);
+if (!dinfo) {
+qerror_report(QERR_DEVICE_NOT_FOUND, id);
+return -1;
+}
+
+/* mark block device unplugged */
+bs = dinfo->bdrv;
+bdrv_unplug(bs);
+
+return 0;
+}
+ 
diff --git a/blockdev.h b/blockdev.h
index 19c6915..ecb9ac8 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -52,5 +52,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
**ret_data);
 int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
 const char *filename, const char *fmt);
+int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 81999aa..7a32a2e 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -68,6 +68,21 @@ Eject a removable medium (use -f to force it).
 ETEXI
 
 {
+.name   = "drive_unplug",
+.args_type  = "id:s",
+.params = "device",
+.help   = "unplug block device",
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_unplug,
+},
+
+STEXI
+...@item unplug @var{device}
+...@findex unplug
+Unplug block device.
+ETEXI
+
+{
 .name   = "change",
 .args_type  = "device:B,target:F,arg:s?",
 .params = "device filename [format]",
-- 
1.6.3.3