Ryan Harper <ry...@us.ibm.com> writes: > * Markus Armbruster <arm...@redhat.com> [2010-11-11 04:48]: >> Ryan Harper <ry...@us.ibm.com> writes: >> >> > * Markus Armbruster <arm...@redhat.com> [2010-11-10 11:40]: >> >> Ryan Harper <ry...@us.ibm.com> writes: >> >> >> >> > * Markus Armbruster <arm...@redhat.com> [2010-11-10 06:48]: >> >> >> One real question, and a couple of nits. >> >> >> >> >> >> Ryan Harper <ry...@us.ibm.com> writes: >> >> >> >> >> >> > Block hot unplug is racy since the guest is required to acknowlege >> >> >> > the ACPI >> >> >> > unplug event; this may not happen synchronously with the device >> >> >> > removal command >> >> >> >> >> >> Well, I wouldn't call unplug "racy". It just takes an unpredictable >> >> >> length of time, possibly forever. To make a race, you need to throw in >> >> >> a client assuming (incorrectly) that unplug is instantaneous, as >> >> >> described in your next paragraph. >> >> >> >> >> >> Moreover, all PCI unplug is that way, not just block. >> >> >> >> >> >> > This series aims to close a gap where by mgmt applications that >> >> >> > assume the >> >> >> > block resource has been removed without confirming that the guest has >> >> >> > acknowledged the removal may re-assign the underlying device to a >> >> >> > second guest >> >> >> > leading to data leakage. >> >> >> >> >> >> Yes, the incorrect assumption is a problem. But with that fixed (in >> >> >> the >> >> >> management application), we run right into the next problem: there is >> >> >> no >> >> >> way for the management application to reliably disconnect the guest >> >> >> from >> >> >> a block device. And that's the problem you're fixing. >> >> > >> >> > Yeah, that's the right way to word it; providing a method to forcibly >> >> > disconnect the guest from the host device. >> >> >> >> >> >> > This series introduces a new montor command to decouple asynchornous >> >> >> > device >> >> >> >> >> >> Typos "montor" and "asynchornous". You might want to use a spell >> >> >> checker :) >> >> >> >> >> >> Lines are a bit long. Recommend wrap at column 70. >> >> >> >> >> >> > removal from restricting guest access to a block device. We do this >> >> >> > by creating >> >> >> > a new monitor command drive_del which maps to a bdrv_unplug() >> >> >> > command which >> >> >> > does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once >> >> >> > complete, subsequent >> >> >> > IO is rejected from the device and the guest will get IO errors but >> >> >> > continue to >> >> >> > function. In addition to preventing further IO, we clean up state >> >> >> > pointers >> >> >> > between host (BlockDriverState) and guest (DeviceInfo). >> >> >> > >> >> >> > A subsequent device removal command can be issued to remove the >> >> >> > device, to which >> >> >> > the guest may or maynot respond, but as long as the unplugged bit is >> >> >> > set, no IO >> >> >> >> >> >> "maynot" is not a word. >> >> >> >> >> >> > will be sumbitted. >> >> >> >> >> >> This suggests to drive_del before device_del, which makes the device >> >> >> goes through a "broken device" state on its way to unplug. If the >> >> >> guest >> >> >> accesses the device in that state, it gets I/O errors. Not nice. >> >> >> >> >> >> Instead, I'd recommend device_del, wait for the device to go away, >> >> >> drive_del on time out. If the guest reacts to the ACPI unplug >> >> >> promptly, >> >> >> it's never exposed to the "broken device" state. Note: if the >> >> >> drive_del >> >> >> fails because the device doesn't exist, we lost the race with the >> >> >> automatic destruction, which is harmless. Ignore that error. >> >> > >> >> > Honestly, other than describing what happens if you sever the connection >> >> > when the guest isn't aware of it; I don't want to try to capture how the >> >> > mgmt layer implements the removal. >> >> > >> >> > One may want to force the disconnect before attempting to remove the >> >> > device; or the other way around; that's really the mgmt layer's call. >> >> >> >> Fair enough. >> >> >> >> >> > Signed-off-by: Ryan Harper <ry...@us.ibm.com> >> >> >> > --- >> >> >> > block.c | 7 +++++++ >> >> >> > block.h | 1 + >> >> >> > blockdev.c | 36 ++++++++++++++++++++++++++++++++++++ >> >> >> > blockdev.h | 1 + >> >> >> > hmp-commands.hx | 18 ++++++++++++++++++ >> >> >> > 5 files changed, 63 insertions(+), 0 deletions(-) >> >> >> > >> >> >> > diff --git a/block.c b/block.c >> >> >> > index 6b505fb..c76a796 100644 >> >> >> > --- a/block.c >> >> >> > +++ b/block.c >> >> >> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, >> >> >> > int removable) >> >> >> > } >> >> >> > } >> >> >> > >> >> >> > +void bdrv_unplug(BlockDriverState *bs) >> >> >> > +{ >> >> >> > + qemu_aio_flush(); >> >> >> > + bdrv_flush(bs); >> >> >> > + bdrv_close(bs); >> >> >> > +} >> >> >> > + >> >> >> >> >> >> Unless we expect more users, I'd inline this into its only caller. >> >> >> Matter of taste. >> >> > >> >> > Works for me. >> >> > >> >> >> >> >> >> > int bdrv_is_removable(BlockDriverState *bs) >> >> >> > { >> >> >> > return bs->removable; >> >> >> > diff --git a/block.h b/block.h >> >> >> > index 78ecfac..581414c 100644 >> >> >> > --- a/block.h >> >> >> > +++ b/block.h >> >> >> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, >> >> >> > BlockErrorAction on_read_error, >> >> >> > BlockErrorAction on_write_error); >> >> >> > BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int >> >> >> > is_read); >> >> >> > void bdrv_set_removable(BlockDriverState *bs, int removable); >> >> >> > +void bdrv_unplug(BlockDriverState *bs); >> >> >> > int bdrv_is_removable(BlockDriverState *bs); >> >> >> > int bdrv_is_read_only(BlockDriverState *bs); >> >> >> > int bdrv_is_sg(BlockDriverState *bs); >> >> >> > diff --git a/blockdev.c b/blockdev.c >> >> >> > index 6cb179a..ee8c2ec 100644 >> >> >> > --- a/blockdev.c >> >> >> > +++ b/blockdev.c >> >> >> > @@ -14,6 +14,8 @@ >> >> >> > #include "qemu-option.h" >> >> >> > #include "qemu-config.h" >> >> >> > #include "sysemu.h" >> >> >> > +#include "hw/qdev.h" >> >> >> > +#include "block_int.h" >> >> >> > >> >> >> > static QTAILQ_HEAD(drivelist, DriveInfo) drives = >> >> >> > QTAILQ_HEAD_INITIALIZER(drives); >> >> >> > >> >> >> > @@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char >> >> >> > *device, >> >> >> > } >> >> >> > return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); >> >> >> > } >> >> >> > + >> >> >> > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject >> >> >> > **ret_data) >> >> >> > +{ >> >> >> > + const char *id = qdict_get_str(qdict, "id"); >> >> >> > + BlockDriverState *bs; >> >> >> > + Property *prop; >> >> >> > + >> >> >> > + bs = bdrv_find(id); >> >> >> > + if (!bs) { >> >> >> > + qerror_report(QERR_DEVICE_NOT_FOUND, id); >> >> >> > + return -1; >> >> >> > + } >> >> >> > + >> >> >> > + /* quiesce block driver; prevent further io */ >> >> >> > + bdrv_unplug(bs); >> >> >> > + >> >> >> > + /* clean up guest state from pointing to host resource by >> >> >> > + * finding and removing DeviceState "drive" property */ >> >> >> > + for (prop = bs->peer->info->props; prop && prop->name; prop++) { >> >> >> > + if ((prop->info->type == PROP_TYPE_DRIVE) && >> >> >> > + (*(BlockDriverState **)qdev_get_prop_ptr(bs->peer, >> >> >> > prop) == bs)) { >> >> >> > + if (prop->info->free) { >> >> >> > + prop->info->free(bs->peer, prop); >> >> >> > + } >> >> >> >> Your use of prop->info->free() in this context is wrong. More below. >> >> >> >> >> >> >> >> Does this null the drive property? I doubt it. Quick check in the >> >> >> debugger? >> >> >> >> >> >> The free callbacks generally don't zap the properties, because they run >> >> >> from qdev_free(). >> >> > >> >> > To be honest; I didn't see anything that looked like "remove this >> >> > property" in the qdev api. Any pointers? >> >> >> >> The closest we have is indeed the Property method free(), but that's not >> >> quite right. It's really only for use by qdev_free(). >> >> >> >> > should I be calling qdev_free() on the dev? >> >> >> >> No, because then the whole device is gone, not just the property :) >> >> >> >> > I don't quite understand >> >> > the distinction between the info list of properties and the device >> >> > itself, nor specifically what we need to remove in the drive_del() >> >> > operation versus the device_del() portion. >> >> >> >> device_del / qdev_free() destroy a qdev, such as a "virtio-blk-pci" >> >> device (C type VirtIOPCIProxy). >> >> >> >> drive_del destroys something else, namely the block device host part >> >> (BlockDriverState + DeviceInfo). Obviously, it needs to zap all >> >> pointers to the host part along with it. Specifically, it needs to zap >> >> the device's pointer to it. >> >> >> >> Example: if a "virtio-blk-pci" device is using drive "foo", then >> >> "drive_del foo" needs to zap its member block.bs. >> >> >> >> Complication: we don't (want to) know what kind of device exactly is >> >> using the drive. But we do know that a drive property must be >> >> describing it. >> >> >> >> So we search the properties (for (prop...)) for a drive property >> >> (prop->info->type == PROP_TYPE_DRIVE) that points to this drive (... == >> >> bs). >> >> >> >> Result: >> >> >> >> BlockDriverState *bs; >> >> Property *prop; >> >> BlockDriverState **ptr; >> >> [...] >> >> for (prop = bs->peer->info->props; prop && prop->name; prop++) { >> >> if ((prop->info->type == PROP_TYPE_DRIVE)) { >> >> ptr = qdev_get_prop_ptr(dev, prop); >> >> if (*ptr == bs) { >> >> bdrv_detach(bs, bs->peer); >> > >> > Invoking the free method on the drive property does do detach: >> > >> > free_drive >> > { >> > BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop); >> > >> > if (*ptr) { >> > bdrv_detach(*ptr, dev); >> > blockdev_auto_del(*ptr); >> > } >> > } >> > >> > and the bdrv_delete() >> > >> > takes out the bs pointer. >> >> Which pointer? Which bdrv_delete()? > > I suppose it's the BlockDriverState returned from bdrv_find() since I'm > invoking bdrv_delete(bs); > > And I suppose qdev_get_prop_ptr() is returning a different ptr to the > same bs; in which case we'll still need the null you had suggested?
Yes. The qdev_get_prop_ptr() returns a pointer to the pointer to the bs you started with. In other words: * bs->peer points from bs to the qdev using this drive * The qdev state contains a pointer back to bs. Example: for virtio-blk-pci, that's VirtIOPCIProxy member block.bs. * a drive property describes that pointer, and qdev_get_prop_ptr() returns a pointer to that pointer in the qdev state. Example: for a virtio-blk-pci, it returns &DO_UPCAST(VirtIOPCIProxy, pci_dev.qdev, bs->peer)->block.bs. To disconnect drive from qdev, we need to zap both bs->peer and the pointer to bs in the qdev state. Clear? [...]