Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-11 Thread Markus Armbruster
Ryan Harper  writes:

> * Markus Armbruster  [2010-11-11 04:48]:
>> Ryan Harper  writes:
>> 
>> > * Markus Armbruster  [2010-11-10 11:40]:
>> >> Ryan Harper  writes:
>> >> 
>> >> > * Markus Armbruster  [2010-11-10 06:48]:
>> >> >> One real question, and a couple of nits.
>> >> >> 
>> >> >> Ryan Harper  writes:
>> >> >> 
>> >> >> > Block hot unplug is racy since the guest is required to acknowlege 
>> >> >> > the ACPI
>> >> >> > unplug event; this may not happen synchronously with the device 
>> >> >> > removal command
>> >> >> 
>> >> >> Well, I wouldn't call unplug "racy".  It just takes an unpredictable
>> >> >> length of time, possibly forever.  To make a race, you need to throw in
>> >> >> a client assuming (incorrectly) that unplug is instantaneous, as
>> >> >> described in your next paragraph.
>> >> >> 
>> >> >> Moreover, all PCI unplug is that way, not just block.
>> >> >> 
>> >> >> > This series aims to close a gap where by mgmt applications that 
>> >> >> > assume the
>> >> >> > block resource has been removed without confirming that the guest has
>> >> >> > acknowledged the removal may re-assign the underlying device to a 
>> >> >> > second guest
>> >> >> > leading to data leakage.
>> >> >> 
>> >> >> Yes, the incorrect assumption is a problem.  But with that fixed (in 
>> >> >> the
>> >> >> management application), we run right into the next problem: there is 
>> >> >> no
>> >> >> way for the management application to reliably disconnect the guest 
>> >> >> from
>> >> >> a block device.  And that's the problem you're fixing.
>> >> >
>> >> > Yeah, that's the right way to word it; providing a method to forcibly
>> >> > disconnect the guest from the host device.
>> >> >> 
>> >> >> > This series introduces a new montor command to decouple asynchornous 
>> >> >> > device
>> >> >> 
>> >> >> Typos "montor" and "asynchornous".  You might want to use a spell
>> >> >> checker :)
>> >> >> 
>> >> >> Lines are a bit long.  Recommend wrap at column 70.
>> >> >> 
>> >> >> > removal from restricting guest access to a block device.  We do this 
>> >> >> > by creating
>> >> >> > a new monitor command drive_del which maps to a bdrv_unplug() 
>> >> >> > command which
>> >> >> > does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once 
>> >> >> > complete, subsequent
>> >> >> > IO is rejected from the device and the guest will get IO errors but 
>> >> >> > continue to
>> >> >> > function.  In addition to preventing further IO, we clean up state 
>> >> >> > pointers
>> >> >> > between host (BlockDriverState) and guest (DeviceInfo).
>> >> >> >
>> >> >> > A subsequent device removal command can be issued to remove the 
>> >> >> > device, to which
>> >> >> > the guest may or maynot respond, but as long as the unplugged bit is 
>> >> >> > set, no IO
>> >> >> 
>> >> >> "maynot" is not a word.
>> >> >> 
>> >> >> > will be sumbitted.
>> >> >> 
>> >> >> This suggests to drive_del before device_del, which makes the device
>> >> >> goes through a "broken device" state on its way to unplug.  If the 
>> >> >> guest
>> >> >> accesses the device in that state, it gets I/O errors.  Not nice.
>> >> >> 
>> >> >> Instead, I'd recommend device_del, wait for the device to go away,
>> >> >> drive_del on time out.  If the guest reacts to the ACPI unplug 
>> >> >> promptly,
>> >> >> it's never exposed to the "broken device" state.  Note: if the 
>> >> >> drive_del
>> >> >> fails because the device doesn't exist, we lost the race with the
>> >> >> automatic destruction, which is harmless.  Ignore that error.
>> >> >
>> >> > Honestly, other than describing what happens if you sever the connection
>> >> > when the guest isn't aware of it; I don't want to try to capture how the
>> >> > mgmt layer implements the removal.  
>> >> >
>> >> > One may want to force the disconnect before attempting to remove the
>> >> > device; or the other way around; that's really the mgmt layer's call.
>> >> 
>> >> Fair enough.
>> >> 
>> >> >> > Signed-off-by: Ryan Harper 
>> >> >> > ---
>> >> >> >  block.c |7 +++
>> >> >> >  block.h |1 +
>> >> >> >  blockdev.c  |   36 
>> >> >> >  blockdev.h  |1 +
>> >> >> >  hmp-commands.hx |   18 ++
>> >> >> >  5 files changed, 63 insertions(+), 0 deletions(-)
>> >> >> >
>> >> >> > diff --git a/block.c b/block.c
>> >> >> > index 6b505fb..c76a796 100644
>> >> >> > --- a/block.c
>> >> >> > +++ b/block.c
>> >> >> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, 
>> >> >> > int removable)
>> >> >> >  }
>> >> >> >  }
>> >> >> >  
>> >> >> > +void bdrv_unplug(BlockDriverState *bs)
>> >> >> > +{
>> >> >> > +qemu_aio_flush();
>> >> >> > +bdrv_flush(bs);
>> >> >> > +bdrv_close(bs);
>> >> >> > +}
>> >> >> > +
>> >> >> 
>> >> >> Unless we expect more users, I'd inline this into its only caller.
>> >> >> Matter of taste.
>> >> >
>> >> > Works for me.
>> >> >
>> >> >> 
>> >> >> >  int bdrv_is_removable(BlockDriv

Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-11 Thread Ryan Harper
* Markus Armbruster  [2010-11-11 04:48]:
> Ryan Harper  writes:
> 
> > * Markus Armbruster  [2010-11-10 11:40]:
> >> Ryan Harper  writes:
> >> 
> >> > * Markus Armbruster  [2010-11-10 06:48]:
> >> >> One real question, and a couple of nits.
> >> >> 
> >> >> Ryan Harper  writes:
> >> >> 
> >> >> > Block hot unplug is racy since the guest is required to acknowlege 
> >> >> > the ACPI
> >> >> > unplug event; this may not happen synchronously with the device 
> >> >> > removal command
> >> >> 
> >> >> Well, I wouldn't call unplug "racy".  It just takes an unpredictable
> >> >> length of time, possibly forever.  To make a race, you need to throw in
> >> >> a client assuming (incorrectly) that unplug is instantaneous, as
> >> >> described in your next paragraph.
> >> >> 
> >> >> Moreover, all PCI unplug is that way, not just block.
> >> >> 
> >> >> > This series aims to close a gap where by mgmt applications that 
> >> >> > assume the
> >> >> > block resource has been removed without confirming that the guest has
> >> >> > acknowledged the removal may re-assign the underlying device to a 
> >> >> > second guest
> >> >> > leading to data leakage.
> >> >> 
> >> >> Yes, the incorrect assumption is a problem.  But with that fixed (in the
> >> >> management application), we run right into the next problem: there is no
> >> >> way for the management application to reliably disconnect the guest from
> >> >> a block device.  And that's the problem you're fixing.
> >> >
> >> > Yeah, that's the right way to word it; providing a method to forcibly
> >> > disconnect the guest from the host device.
> >> >> 
> >> >> > This series introduces a new montor command to decouple asynchornous 
> >> >> > device
> >> >> 
> >> >> Typos "montor" and "asynchornous".  You might want to use a spell
> >> >> checker :)
> >> >> 
> >> >> Lines are a bit long.  Recommend wrap at column 70.
> >> >> 
> >> >> > removal from restricting guest access to a block device.  We do this 
> >> >> > by creating
> >> >> > a new monitor command drive_del which maps to a bdrv_unplug() command 
> >> >> > which
> >> >> > does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
> >> >> > subsequent
> >> >> > IO is rejected from the device and the guest will get IO errors but 
> >> >> > continue to
> >> >> > function.  In addition to preventing further IO, we clean up state 
> >> >> > pointers
> >> >> > between host (BlockDriverState) and guest (DeviceInfo).
> >> >> >
> >> >> > A subsequent device removal command can be issued to remove the 
> >> >> > device, to which
> >> >> > the guest may or maynot respond, but as long as the unplugged bit is 
> >> >> > set, no IO
> >> >> 
> >> >> "maynot" is not a word.
> >> >> 
> >> >> > will be sumbitted.
> >> >> 
> >> >> This suggests to drive_del before device_del, which makes the device
> >> >> goes through a "broken device" state on its way to unplug.  If the guest
> >> >> accesses the device in that state, it gets I/O errors.  Not nice.
> >> >> 
> >> >> Instead, I'd recommend device_del, wait for the device to go away,
> >> >> drive_del on time out.  If the guest reacts to the ACPI unplug promptly,
> >> >> it's never exposed to the "broken device" state.  Note: if the drive_del
> >> >> fails because the device doesn't exist, we lost the race with the
> >> >> automatic destruction, which is harmless.  Ignore that error.
> >> >
> >> > Honestly, other than describing what happens if you sever the connection
> >> > when the guest isn't aware of it; I don't want to try to capture how the
> >> > mgmt layer implements the removal.  
> >> >
> >> > One may want to force the disconnect before attempting to remove the
> >> > device; or the other way around; that's really the mgmt layer's call.
> >> 
> >> Fair enough.
> >> 
> >> >> > Signed-off-by: Ryan Harper 
> >> >> > ---
> >> >> >  block.c |7 +++
> >> >> >  block.h |1 +
> >> >> >  blockdev.c  |   36 
> >> >> >  blockdev.h  |1 +
> >> >> >  hmp-commands.hx |   18 ++
> >> >> >  5 files changed, 63 insertions(+), 0 deletions(-)
> >> >> >
> >> >> > diff --git a/block.c b/block.c
> >> >> > index 6b505fb..c76a796 100644
> >> >> > --- a/block.c
> >> >> > +++ b/block.c
> >> >> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, 
> >> >> > int removable)
> >> >> >  }
> >> >> >  }
> >> >> >  
> >> >> > +void bdrv_unplug(BlockDriverState *bs)
> >> >> > +{
> >> >> > +qemu_aio_flush();
> >> >> > +bdrv_flush(bs);
> >> >> > +bdrv_close(bs);
> >> >> > +}
> >> >> > +
> >> >> 
> >> >> Unless we expect more users, I'd inline this into its only caller.
> >> >> Matter of taste.
> >> >
> >> > Works for me.
> >> >
> >> >> 
> >> >> >  int bdrv_is_removable(BlockDriverState *bs)
> >> >> >  {
> >> >> >  return bs->removable;
> >> >> > diff --git a/block.h b/block.h
> >> >> > index 78ecfac..581414c 100644
> >> >> > --- a/block.h
> >> >> > +++ b/block.h
> >> >> > 

Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-11 Thread Markus Armbruster
Ryan Harper  writes:

> * Markus Armbruster  [2010-11-10 11:40]:
>> Ryan Harper  writes:
>> 
>> > * Markus Armbruster  [2010-11-10 06:48]:
>> >> One real question, and a couple of nits.
>> >> 
>> >> Ryan Harper  writes:
>> >> 
>> >> > Block hot unplug is racy since the guest is required to acknowlege the 
>> >> > ACPI
>> >> > unplug event; this may not happen synchronously with the device removal 
>> >> > command
>> >> 
>> >> Well, I wouldn't call unplug "racy".  It just takes an unpredictable
>> >> length of time, possibly forever.  To make a race, you need to throw in
>> >> a client assuming (incorrectly) that unplug is instantaneous, as
>> >> described in your next paragraph.
>> >> 
>> >> Moreover, all PCI unplug is that way, not just block.
>> >> 
>> >> > This series aims to close a gap where by mgmt applications that assume 
>> >> > the
>> >> > block resource has been removed without confirming that the guest has
>> >> > acknowledged the removal may re-assign the underlying device to a 
>> >> > second guest
>> >> > leading to data leakage.
>> >> 
>> >> Yes, the incorrect assumption is a problem.  But with that fixed (in the
>> >> management application), we run right into the next problem: there is no
>> >> way for the management application to reliably disconnect the guest from
>> >> a block device.  And that's the problem you're fixing.
>> >
>> > Yeah, that's the right way to word it; providing a method to forcibly
>> > disconnect the guest from the host device.
>> >> 
>> >> > This series introduces a new montor command to decouple asynchornous 
>> >> > device
>> >> 
>> >> Typos "montor" and "asynchornous".  You might want to use a spell
>> >> checker :)
>> >> 
>> >> Lines are a bit long.  Recommend wrap at column 70.
>> >> 
>> >> > removal from restricting guest access to a block device.  We do this by 
>> >> > creating
>> >> > a new monitor command drive_del which maps to a bdrv_unplug() command 
>> >> > which
>> >> > does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
>> >> > subsequent
>> >> > IO is rejected from the device and the guest will get IO errors but 
>> >> > continue to
>> >> > function.  In addition to preventing further IO, we clean up state 
>> >> > pointers
>> >> > between host (BlockDriverState) and guest (DeviceInfo).
>> >> >
>> >> > A subsequent device removal command can be issued to remove the device, 
>> >> > to which
>> >> > the guest may or maynot respond, but as long as the unplugged bit is 
>> >> > set, no IO
>> >> 
>> >> "maynot" is not a word.
>> >> 
>> >> > will be sumbitted.
>> >> 
>> >> This suggests to drive_del before device_del, which makes the device
>> >> goes through a "broken device" state on its way to unplug.  If the guest
>> >> accesses the device in that state, it gets I/O errors.  Not nice.
>> >> 
>> >> Instead, I'd recommend device_del, wait for the device to go away,
>> >> drive_del on time out.  If the guest reacts to the ACPI unplug promptly,
>> >> it's never exposed to the "broken device" state.  Note: if the drive_del
>> >> fails because the device doesn't exist, we lost the race with the
>> >> automatic destruction, which is harmless.  Ignore that error.
>> >
>> > Honestly, other than describing what happens if you sever the connection
>> > when the guest isn't aware of it; I don't want to try to capture how the
>> > mgmt layer implements the removal.  
>> >
>> > One may want to force the disconnect before attempting to remove the
>> > device; or the other way around; that's really the mgmt layer's call.
>> 
>> Fair enough.
>> 
>> >> > Signed-off-by: Ryan Harper 
>> >> > ---
>> >> >  block.c |7 +++
>> >> >  block.h |1 +
>> >> >  blockdev.c  |   36 
>> >> >  blockdev.h  |1 +
>> >> >  hmp-commands.hx |   18 ++
>> >> >  5 files changed, 63 insertions(+), 0 deletions(-)
>> >> >
>> >> > diff --git a/block.c b/block.c
>> >> > index 6b505fb..c76a796 100644
>> >> > --- a/block.c
>> >> > +++ b/block.c
>> >> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, 
>> >> > int removable)
>> >> >  }
>> >> >  }
>> >> >  
>> >> > +void bdrv_unplug(BlockDriverState *bs)
>> >> > +{
>> >> > +qemu_aio_flush();
>> >> > +bdrv_flush(bs);
>> >> > +bdrv_close(bs);
>> >> > +}
>> >> > +
>> >> 
>> >> Unless we expect more users, I'd inline this into its only caller.
>> >> Matter of taste.
>> >
>> > Works for me.
>> >
>> >> 
>> >> >  int bdrv_is_removable(BlockDriverState *bs)
>> >> >  {
>> >> >  return bs->removable;
>> >> > diff --git a/block.h b/block.h
>> >> > index 78ecfac..581414c 100644
>> >> > --- a/block.h
>> >> > +++ b/block.h
>> >> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
>> >> > BlockErrorAction on_read_error,
>> >> > BlockErrorAction on_write_error);
>> >> >  BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
>> >> >  void bdrv_set_removable(BlockD

Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-10 Thread Ryan Harper
* Markus Armbruster  [2010-11-10 11:40]:
> Ryan Harper  writes:
> 
> > * Markus Armbruster  [2010-11-10 06:48]:
> >> One real question, and a couple of nits.
> >> 
> >> Ryan Harper  writes:
> >> 
> >> > Block hot unplug is racy since the guest is required to acknowlege the 
> >> > ACPI
> >> > unplug event; this may not happen synchronously with the device removal 
> >> > command
> >> 
> >> Well, I wouldn't call unplug "racy".  It just takes an unpredictable
> >> length of time, possibly forever.  To make a race, you need to throw in
> >> a client assuming (incorrectly) that unplug is instantaneous, as
> >> described in your next paragraph.
> >> 
> >> Moreover, all PCI unplug is that way, not just block.
> >> 
> >> > This series aims to close a gap where by mgmt applications that assume 
> >> > the
> >> > block resource has been removed without confirming that the guest has
> >> > acknowledged the removal may re-assign the underlying device to a second 
> >> > guest
> >> > leading to data leakage.
> >> 
> >> Yes, the incorrect assumption is a problem.  But with that fixed (in the
> >> management application), we run right into the next problem: there is no
> >> way for the management application to reliably disconnect the guest from
> >> a block device.  And that's the problem you're fixing.
> >
> > Yeah, that's the right way to word it; providing a method to forcibly
> > disconnect the guest from the host device.
> >> 
> >> > This series introduces a new montor command to decouple asynchornous 
> >> > device
> >> 
> >> Typos "montor" and "asynchornous".  You might want to use a spell
> >> checker :)
> >> 
> >> Lines are a bit long.  Recommend wrap at column 70.
> >> 
> >> > removal from restricting guest access to a block device.  We do this by 
> >> > creating
> >> > a new monitor command drive_del which maps to a bdrv_unplug() command 
> >> > which
> >> > does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
> >> > subsequent
> >> > IO is rejected from the device and the guest will get IO errors but 
> >> > continue to
> >> > function.  In addition to preventing further IO, we clean up state 
> >> > pointers
> >> > between host (BlockDriverState) and guest (DeviceInfo).
> >> >
> >> > A subsequent device removal command can be issued to remove the device, 
> >> > to which
> >> > the guest may or maynot respond, but as long as the unplugged bit is 
> >> > set, no IO
> >> 
> >> "maynot" is not a word.
> >> 
> >> > will be sumbitted.
> >> 
> >> This suggests to drive_del before device_del, which makes the device
> >> goes through a "broken device" state on its way to unplug.  If the guest
> >> accesses the device in that state, it gets I/O errors.  Not nice.
> >> 
> >> Instead, I'd recommend device_del, wait for the device to go away,
> >> drive_del on time out.  If the guest reacts to the ACPI unplug promptly,
> >> it's never exposed to the "broken device" state.  Note: if the drive_del
> >> fails because the device doesn't exist, we lost the race with the
> >> automatic destruction, which is harmless.  Ignore that error.
> >
> > Honestly, other than describing what happens if you sever the connection
> > when the guest isn't aware of it; I don't want to try to capture how the
> > mgmt layer implements the removal.  
> >
> > One may want to force the disconnect before attempting to remove the
> > device; or the other way around; that's really the mgmt layer's call.
> 
> Fair enough.
> 
> >> > Signed-off-by: Ryan Harper 
> >> > ---
> >> >  block.c |7 +++
> >> >  block.h |1 +
> >> >  blockdev.c  |   36 
> >> >  blockdev.h  |1 +
> >> >  hmp-commands.hx |   18 ++
> >> >  5 files changed, 63 insertions(+), 0 deletions(-)
> >> >
> >> > diff --git a/block.c b/block.c
> >> > index 6b505fb..c76a796 100644
> >> > --- a/block.c
> >> > +++ b/block.c
> >> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
> >> > removable)
> >> >  }
> >> >  }
> >> >  
> >> > +void bdrv_unplug(BlockDriverState *bs)
> >> > +{
> >> > +qemu_aio_flush();
> >> > +bdrv_flush(bs);
> >> > +bdrv_close(bs);
> >> > +}
> >> > +
> >> 
> >> Unless we expect more users, I'd inline this into its only caller.
> >> Matter of taste.
> >
> > Works for me.
> >
> >> 
> >> >  int bdrv_is_removable(BlockDriverState *bs)
> >> >  {
> >> >  return bs->removable;
> >> > diff --git a/block.h b/block.h
> >> > index 78ecfac..581414c 100644
> >> > --- a/block.h
> >> > +++ b/block.h
> >> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
> >> > BlockErrorAction on_read_error,
> >> > BlockErrorAction on_write_error);
> >> >  BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
> >> >  void bdrv_set_removable(BlockDriverState *bs, int removable);
> >> > +void bdrv_unplug(BlockDriverState *bs);
> >> >  int bdrv_is_removable(BlockDriverState *bs);
> >> >  int bdrv_

Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-10 Thread Ryan Harper
* Markus Armbruster  [2010-11-10 11:40]:
> Ryan Harper  writes:
> 
> > * Markus Armbruster  [2010-11-10 06:48]:
> >> One real question, and a couple of nits.
> >> 
> >> Ryan Harper  writes:
> >> 
> >> > Block hot unplug is racy since the guest is required to acknowlege the 
> >> > ACPI
> >> > unplug event; this may not happen synchronously with the device removal 
> >> > command
> >> 
> >> Well, I wouldn't call unplug "racy".  It just takes an unpredictable
> >> length of time, possibly forever.  To make a race, you need to throw in
> >> a client assuming (incorrectly) that unplug is instantaneous, as
> >> described in your next paragraph.
> >> 
> >> Moreover, all PCI unplug is that way, not just block.
> >> 
> >> > This series aims to close a gap where by mgmt applications that assume 
> >> > the
> >> > block resource has been removed without confirming that the guest has
> >> > acknowledged the removal may re-assign the underlying device to a second 
> >> > guest
> >> > leading to data leakage.
> >> 
> >> Yes, the incorrect assumption is a problem.  But with that fixed (in the
> >> management application), we run right into the next problem: there is no
> >> way for the management application to reliably disconnect the guest from
> >> a block device.  And that's the problem you're fixing.
> >
> > Yeah, that's the right way to word it; providing a method to forcibly
> > disconnect the guest from the host device.
> >> 
> >> > This series introduces a new montor command to decouple asynchornous 
> >> > device
> >> 
> >> Typos "montor" and "asynchornous".  You might want to use a spell
> >> checker :)
> >> 
> >> Lines are a bit long.  Recommend wrap at column 70.
> >> 
> >> > removal from restricting guest access to a block device.  We do this by 
> >> > creating
> >> > a new monitor command drive_del which maps to a bdrv_unplug() command 
> >> > which
> >> > does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
> >> > subsequent
> >> > IO is rejected from the device and the guest will get IO errors but 
> >> > continue to
> >> > function.  In addition to preventing further IO, we clean up state 
> >> > pointers
> >> > between host (BlockDriverState) and guest (DeviceInfo).
> >> >
> >> > A subsequent device removal command can be issued to remove the device, 
> >> > to which
> >> > the guest may or maynot respond, but as long as the unplugged bit is 
> >> > set, no IO
> >> 
> >> "maynot" is not a word.
> >> 
> >> > will be sumbitted.
> >> 
> >> This suggests to drive_del before device_del, which makes the device
> >> goes through a "broken device" state on its way to unplug.  If the guest
> >> accesses the device in that state, it gets I/O errors.  Not nice.
> >> 
> >> Instead, I'd recommend device_del, wait for the device to go away,
> >> drive_del on time out.  If the guest reacts to the ACPI unplug promptly,
> >> it's never exposed to the "broken device" state.  Note: if the drive_del
> >> fails because the device doesn't exist, we lost the race with the
> >> automatic destruction, which is harmless.  Ignore that error.
> >
> > Honestly, other than describing what happens if you sever the connection
> > when the guest isn't aware of it; I don't want to try to capture how the
> > mgmt layer implements the removal.  
> >
> > One may want to force the disconnect before attempting to remove the
> > device; or the other way around; that's really the mgmt layer's call.
> 
> Fair enough.
> 
> >> > Signed-off-by: Ryan Harper 
> >> > ---
> >> >  block.c |7 +++
> >> >  block.h |1 +
> >> >  blockdev.c  |   36 
> >> >  blockdev.h  |1 +
> >> >  hmp-commands.hx |   18 ++
> >> >  5 files changed, 63 insertions(+), 0 deletions(-)
> >> >
> >> > diff --git a/block.c b/block.c
> >> > index 6b505fb..c76a796 100644
> >> > --- a/block.c
> >> > +++ b/block.c
> >> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
> >> > removable)
> >> >  }
> >> >  }
> >> >  
> >> > +void bdrv_unplug(BlockDriverState *bs)
> >> > +{
> >> > +qemu_aio_flush();
> >> > +bdrv_flush(bs);
> >> > +bdrv_close(bs);
> >> > +}
> >> > +
> >> 
> >> Unless we expect more users, I'd inline this into its only caller.
> >> Matter of taste.
> >
> > Works for me.
> >
> >> 
> >> >  int bdrv_is_removable(BlockDriverState *bs)
> >> >  {
> >> >  return bs->removable;
> >> > diff --git a/block.h b/block.h
> >> > index 78ecfac..581414c 100644
> >> > --- a/block.h
> >> > +++ b/block.h
> >> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
> >> > BlockErrorAction on_read_error,
> >> > BlockErrorAction on_write_error);
> >> >  BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
> >> >  void bdrv_set_removable(BlockDriverState *bs, int removable);
> >> > +void bdrv_unplug(BlockDriverState *bs);
> >> >  int bdrv_is_removable(BlockDriverState *bs);
> >> >  int bdrv_

Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-10 Thread Markus Armbruster
Ryan Harper  writes:

> * Markus Armbruster  [2010-11-10 06:48]:
>> One real question, and a couple of nits.
>> 
>> Ryan Harper  writes:
>> 
>> > Block hot unplug is racy since the guest is required to acknowlege the ACPI
>> > unplug event; this may not happen synchronously with the device removal 
>> > command
>> 
>> Well, I wouldn't call unplug "racy".  It just takes an unpredictable
>> length of time, possibly forever.  To make a race, you need to throw in
>> a client assuming (incorrectly) that unplug is instantaneous, as
>> described in your next paragraph.
>> 
>> Moreover, all PCI unplug is that way, not just block.
>> 
>> > This series aims to close a gap where by mgmt applications that assume the
>> > block resource has been removed without confirming that the guest has
>> > acknowledged the removal may re-assign the underlying device to a second 
>> > guest
>> > leading to data leakage.
>> 
>> Yes, the incorrect assumption is a problem.  But with that fixed (in the
>> management application), we run right into the next problem: there is no
>> way for the management application to reliably disconnect the guest from
>> a block device.  And that's the problem you're fixing.
>
> Yeah, that's the right way to word it; providing a method to forcibly
> disconnect the guest from the host device.
>> 
>> > This series introduces a new montor command to decouple asynchornous device
>> 
>> Typos "montor" and "asynchornous".  You might want to use a spell
>> checker :)
>> 
>> Lines are a bit long.  Recommend wrap at column 70.
>> 
>> > removal from restricting guest access to a block device.  We do this by 
>> > creating
>> > a new monitor command drive_del which maps to a bdrv_unplug() command which
>> > does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
>> > subsequent
>> > IO is rejected from the device and the guest will get IO errors but 
>> > continue to
>> > function.  In addition to preventing further IO, we clean up state pointers
>> > between host (BlockDriverState) and guest (DeviceInfo).
>> >
>> > A subsequent device removal command can be issued to remove the device, to 
>> > which
>> > the guest may or maynot respond, but as long as the unplugged bit is set, 
>> > no IO
>> 
>> "maynot" is not a word.
>> 
>> > will be sumbitted.
>> 
>> This suggests to drive_del before device_del, which makes the device
>> goes through a "broken device" state on its way to unplug.  If the guest
>> accesses the device in that state, it gets I/O errors.  Not nice.
>> 
>> Instead, I'd recommend device_del, wait for the device to go away,
>> drive_del on time out.  If the guest reacts to the ACPI unplug promptly,
>> it's never exposed to the "broken device" state.  Note: if the drive_del
>> fails because the device doesn't exist, we lost the race with the
>> automatic destruction, which is harmless.  Ignore that error.
>
> Honestly, other than describing what happens if you sever the connection
> when the guest isn't aware of it; I don't want to try to capture how the
> mgmt layer implements the removal.  
>
> One may want to force the disconnect before attempting to remove the
> device; or the other way around; that's really the mgmt layer's call.

Fair enough.

>> > Signed-off-by: Ryan Harper 
>> > ---
>> >  block.c |7 +++
>> >  block.h |1 +
>> >  blockdev.c  |   36 
>> >  blockdev.h  |1 +
>> >  hmp-commands.hx |   18 ++
>> >  5 files changed, 63 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/block.c b/block.c
>> > index 6b505fb..c76a796 100644
>> > --- a/block.c
>> > +++ b/block.c
>> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
>> > removable)
>> >  }
>> >  }
>> >  
>> > +void bdrv_unplug(BlockDriverState *bs)
>> > +{
>> > +qemu_aio_flush();
>> > +bdrv_flush(bs);
>> > +bdrv_close(bs);
>> > +}
>> > +
>> 
>> Unless we expect more users, I'd inline this into its only caller.
>> Matter of taste.
>
> Works for me.
>
>> 
>> >  int bdrv_is_removable(BlockDriverState *bs)
>> >  {
>> >  return bs->removable;
>> > diff --git a/block.h b/block.h
>> > index 78ecfac..581414c 100644
>> > --- a/block.h
>> > +++ b/block.h
>> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
>> > BlockErrorAction on_read_error,
>> > BlockErrorAction on_write_error);
>> >  BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
>> >  void bdrv_set_removable(BlockDriverState *bs, int removable);
>> > +void bdrv_unplug(BlockDriverState *bs);
>> >  int bdrv_is_removable(BlockDriverState *bs);
>> >  int bdrv_is_read_only(BlockDriverState *bs);
>> >  int bdrv_is_sg(BlockDriverState *bs);
>> > diff --git a/blockdev.c b/blockdev.c
>> > index 6cb179a..ee8c2ec 100644
>> > --- a/blockdev.c
>> > +++ b/blockdev.c
>> > @@ -14,6 +14,8 @@
>> >  #include "qemu-option.h"
>> >  #include "qemu-config.h"
>> >  #include "sysemu.h"
>> > +#include "h

Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-10 Thread Ryan Harper
* Markus Armbruster  [2010-11-10 06:48]:
> One real question, and a couple of nits.
> 
> Ryan Harper  writes:
> 
> > Block hot unplug is racy since the guest is required to acknowlege the ACPI
> > unplug event; this may not happen synchronously with the device removal 
> > command
> 
> Well, I wouldn't call unplug "racy".  It just takes an unpredictable
> length of time, possibly forever.  To make a race, you need to throw in
> a client assuming (incorrectly) that unplug is instantaneous, as
> described in your next paragraph.
> 
> Moreover, all PCI unplug is that way, not just block.
> 
> > This series aims to close a gap where by mgmt applications that assume the
> > block resource has been removed without confirming that the guest has
> > acknowledged the removal may re-assign the underlying device to a second 
> > guest
> > leading to data leakage.
> 
> Yes, the incorrect assumption is a problem.  But with that fixed (in the
> management application), we run right into the next problem: there is no
> way for the management application to reliably disconnect the guest from
> a block device.  And that's the problem you're fixing.

Yeah, that's the right way to word it; providing a method to forcibly
disconnect the guest from the host device.
> 
> > This series introduces a new montor command to decouple asynchornous device
> 
> Typos "montor" and "asynchornous".  You might want to use a spell
> checker :)
> 
> Lines are a bit long.  Recommend wrap at column 70.
> 
> > removal from restricting guest access to a block device.  We do this by 
> > creating
> > a new monitor command drive_del which maps to a bdrv_unplug() command which
> > does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
> > subsequent
> > IO is rejected from the device and the guest will get IO errors but 
> > continue to
> > function.  In addition to preventing further IO, we clean up state pointers
> > between host (BlockDriverState) and guest (DeviceInfo).
> >
> > A subsequent device removal command can be issued to remove the device, to 
> > which
> > the guest may or maynot respond, but as long as the unplugged bit is set, 
> > no IO
> 
> "maynot" is not a word.
> 
> > will be sumbitted.
> 
> This suggests to drive_del before device_del, which makes the device
> goes through a "broken device" state on its way to unplug.  If the guest
> accesses the device in that state, it gets I/O errors.  Not nice.
> 
> Instead, I'd recommend device_del, wait for the device to go away,
> drive_del on time out.  If the guest reacts to the ACPI unplug promptly,
> it's never exposed to the "broken device" state.  Note: if the drive_del
> fails because the device doesn't exist, we lost the race with the
> automatic destruction, which is harmless.  Ignore that error.

Honestly, other than describing what happens if you sever the connection
when the guest isn't aware of it; I don't want to try to capture how the
mgmt layer implements the removal.  

One may want to force the disconnect before attempting to remove the
device; or the other way around; that's really the mgmt layer's call.

> 
> > Signed-off-by: Ryan Harper 
> > ---
> >  block.c |7 +++
> >  block.h |1 +
> >  blockdev.c  |   36 
> >  blockdev.h  |1 +
> >  hmp-commands.hx |   18 ++
> >  5 files changed, 63 insertions(+), 0 deletions(-)
> >
> > diff --git a/block.c b/block.c
> > index 6b505fb..c76a796 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
> > removable)
> >  }
> >  }
> >  
> > +void bdrv_unplug(BlockDriverState *bs)
> > +{
> > +qemu_aio_flush();
> > +bdrv_flush(bs);
> > +bdrv_close(bs);
> > +}
> > +
> 
> Unless we expect more users, I'd inline this into its only caller.
> Matter of taste.

Works for me.

> 
> >  int bdrv_is_removable(BlockDriverState *bs)
> >  {
> >  return bs->removable;
> > diff --git a/block.h b/block.h
> > index 78ecfac..581414c 100644
> > --- a/block.h
> > +++ b/block.h
> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
> > BlockErrorAction on_read_error,
> > BlockErrorAction on_write_error);
> >  BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
> >  void bdrv_set_removable(BlockDriverState *bs, int removable);
> > +void bdrv_unplug(BlockDriverState *bs);
> >  int bdrv_is_removable(BlockDriverState *bs);
> >  int bdrv_is_read_only(BlockDriverState *bs);
> >  int bdrv_is_sg(BlockDriverState *bs);
> > diff --git a/blockdev.c b/blockdev.c
> > index 6cb179a..ee8c2ec 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -14,6 +14,8 @@
> >  #include "qemu-option.h"
> >  #include "qemu-config.h"
> >  #include "sysemu.h"
> > +#include "hw/qdev.h"
> > +#include "block_int.h"
> >  
> >  static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
> > QTAILQ_HEAD_INITIALIZER(drives);
> >  
> > @@ -597,3 +599,37 @@ int

Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-10 Thread Markus Armbruster
One real question, and a couple of nits.

Ryan Harper  writes:

> Block hot unplug is racy since the guest is required to acknowlege the ACPI
> unplug event; this may not happen synchronously with the device removal 
> command

Well, I wouldn't call unplug "racy".  It just takes an unpredictable
length of time, possibly forever.  To make a race, you need to throw in
a client assuming (incorrectly) that unplug is instantaneous, as
described in your next paragraph.

Moreover, all PCI unplug is that way, not just block.

> This series aims to close a gap where by mgmt applications that assume the
> block resource has been removed without confirming that the guest has
> acknowledged the removal may re-assign the underlying device to a second guest
> leading to data leakage.

Yes, the incorrect assumption is a problem.  But with that fixed (in the
management application), we run right into the next problem: there is no
way for the management application to reliably disconnect the guest from
a block device.  And that's the problem you're fixing.

> This series introduces a new montor command to decouple asynchornous device

Typos "montor" and "asynchornous".  You might want to use a spell
checker :)

Lines are a bit long.  Recommend wrap at column 70.

> removal from restricting guest access to a block device.  We do this by 
> creating
> a new monitor command drive_del which maps to a bdrv_unplug() command which
> does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, 
> subsequent
> IO is rejected from the device and the guest will get IO errors but continue 
> to
> function.  In addition to preventing further IO, we clean up state pointers
> between host (BlockDriverState) and guest (DeviceInfo).
>
> A subsequent device removal command can be issued to remove the device, to 
> which
> the guest may or maynot respond, but as long as the unplugged bit is set, no 
> IO

"maynot" is not a word.

> will be sumbitted.

This suggests to drive_del before device_del, which makes the device
goes through a "broken device" state on its way to unplug.  If the guest
accesses the device in that state, it gets I/O errors.  Not nice.

Instead, I'd recommend device_del, wait for the device to go away,
drive_del on time out.  If the guest reacts to the ACPI unplug promptly,
it's never exposed to the "broken device" state.  Note: if the drive_del
fails because the device doesn't exist, we lost the race with the
automatic destruction, which is harmless.  Ignore that error.

> Signed-off-by: Ryan Harper 
> ---
>  block.c |7 +++
>  block.h |1 +
>  blockdev.c  |   36 
>  blockdev.h  |1 +
>  hmp-commands.hx |   18 ++
>  5 files changed, 63 insertions(+), 0 deletions(-)
>
> diff --git a/block.c b/block.c
> index 6b505fb..c76a796 100644
> --- a/block.c
> +++ b/block.c
> @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
> removable)
>  }
>  }
>  
> +void bdrv_unplug(BlockDriverState *bs)
> +{
> +qemu_aio_flush();
> +bdrv_flush(bs);
> +bdrv_close(bs);
> +}
> +

Unless we expect more users, I'd inline this into its only caller.
Matter of taste.

>  int bdrv_is_removable(BlockDriverState *bs)
>  {
>  return bs->removable;
> diff --git a/block.h b/block.h
> index 78ecfac..581414c 100644
> --- a/block.h
> +++ b/block.h
> @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
> BlockErrorAction on_read_error,
> BlockErrorAction on_write_error);
>  BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
>  void bdrv_set_removable(BlockDriverState *bs, int removable);
> +void bdrv_unplug(BlockDriverState *bs);
>  int bdrv_is_removable(BlockDriverState *bs);
>  int bdrv_is_read_only(BlockDriverState *bs);
>  int bdrv_is_sg(BlockDriverState *bs);
> diff --git a/blockdev.c b/blockdev.c
> index 6cb179a..ee8c2ec 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -14,6 +14,8 @@
>  #include "qemu-option.h"
>  #include "qemu-config.h"
>  #include "sysemu.h"
> +#include "hw/qdev.h"
> +#include "block_int.h"
>  
>  static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
> QTAILQ_HEAD_INITIALIZER(drives);
>  
> @@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char *device,
>  }
>  return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
>  }
> +
> +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> +const char *id = qdict_get_str(qdict, "id");
> +BlockDriverState *bs;
> +Property *prop;
> +
> +bs = bdrv_find(id);
> +if (!bs) {
> +qerror_report(QERR_DEVICE_NOT_FOUND, id);
> +return -1;
> +}
> +
> +/* quiesce block driver; prevent further io */
> +bdrv_unplug(bs);
> +
> +/* clean up guest state from pointing to host resource by
> + * finding and removing DeviceState "drive" property */
> +for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> + 

[Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()

2010-11-08 Thread Ryan Harper
Block hot unplug is racy since the guest is required to acknowlege the ACPI
unplug event; this may not happen synchronously with the device removal command

This series aims to close a gap where by mgmt applications that assume the
block resource has been removed without confirming that the guest has
acknowledged the removal may re-assign the underlying device to a second guest
leading to data leakage.

This series introduces a new montor command to decouple asynchornous device
removal from restricting guest access to a block device.  We do this by creating
a new monitor command drive_del which maps to a bdrv_unplug() command which
does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
IO is rejected from the device and the guest will get IO errors but continue to
function.  In addition to preventing further IO, we clean up state pointers
between host (BlockDriverState) and guest (DeviceInfo).

A subsequent device removal command can be issued to remove the device, to which
the guest may or maynot respond, but as long as the unplugged bit is set, no IO
will be sumbitted.

Signed-off-by: Ryan Harper 
---
 block.c |7 +++
 block.h |1 +
 blockdev.c  |   36 
 blockdev.h  |1 +
 hmp-commands.hx |   18 ++
 5 files changed, 63 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 6b505fb..c76a796 100644
--- a/block.c
+++ b/block.c
@@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int 
removable)
 }
 }
 
+void bdrv_unplug(BlockDriverState *bs)
+{
+qemu_aio_flush();
+bdrv_flush(bs);
+bdrv_close(bs);
+}
+
 int bdrv_is_removable(BlockDriverState *bs)
 {
 return bs->removable;
diff --git a/block.h b/block.h
index 78ecfac..581414c 100644
--- a/block.h
+++ b/block.h
@@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, 
BlockErrorAction on_read_error,
BlockErrorAction on_write_error);
 BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
 void bdrv_set_removable(BlockDriverState *bs, int removable);
+void bdrv_unplug(BlockDriverState *bs);
 int bdrv_is_removable(BlockDriverState *bs);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index 6cb179a..ee8c2ec 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -14,6 +14,8 @@
 #include "qemu-option.h"
 #include "qemu-config.h"
 #include "sysemu.h"
+#include "hw/qdev.h"
+#include "block_int.h"
 
 static QTAILQ_HEAD(drivelist, DriveInfo) drives = 
QTAILQ_HEAD_INITIALIZER(drives);
 
@@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char *device,
 }
 return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
+
+int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+const char *id = qdict_get_str(qdict, "id");
+BlockDriverState *bs;
+Property *prop;
+
+bs = bdrv_find(id);
+if (!bs) {
+qerror_report(QERR_DEVICE_NOT_FOUND, id);
+return -1;
+}
+
+/* quiesce block driver; prevent further io */
+bdrv_unplug(bs);
+
+/* clean up guest state from pointing to host resource by
+ * finding and removing DeviceState "drive" property */
+for (prop = bs->peer->info->props; prop && prop->name; prop++) {
+if ((prop->info->type == PROP_TYPE_DRIVE) && 
+(*(BlockDriverState **)qdev_get_prop_ptr(bs->peer, prop) == bs)) {
+if (prop->info->free) {
+prop->info->free(bs->peer, prop);
+}
+}
+}
+
+/* clean up host state pointing to guest resource by removing
+ * pointers to guest device in the BlockDriverState */
+bdrv_delete(bs);
+
+return 0;
+}
+ 
diff --git a/blockdev.h b/blockdev.h
index 653affc..2a0559e 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject 
**ret_data);
 int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
 const char *filename, const char *fmt);
+int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e5585ba..d6dc18c 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it).
 ETEXI
 
 {
+.name   = "drive_del",
+.args_type  = "id:s",
+.params = "device",
+.help   = "remove host block device",
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_drive_del,
+},
+
+STEXI
+...@item delete @var{device}
+...@findex delete
+Remove host block device.  The result is that guest generated IO is no longer
+submitted against the host device underlying the disk.  Once a drive has
+been deleted, the QEMU Block layer returns -EIO which results in IO 
+errors in the gues