* Markus Armbruster <arm...@redhat.com> [2010-10-29 11:11]: > Ryan Harper <ry...@us.ibm.com> writes: > > > * Markus Armbruster <arm...@redhat.com> [2010-10-29 09:13]: > >> [Note cc: Michael] > >> > >> Ryan Harper <ry...@us.ibm.com> writes: > >> > >> > >> If I understand your patch correctly, the difference between your > >> drive_unplug and my blockdev_del is as follows: > >> > >> * drive_unplug forcefully severs the connection between the host part of > >> the block device and its BlockDriverState. A shell of the host part > >> remains, to be cleaned up later. You need forceful disconnect > >> operation to be able to revoke access to an image whether the guest > >> cooperates or not. Fair enough. > >> > >> * blockdev_del deletes a host part. My current version fails when the > >> host part is in use. I patterned that after netdev_del, which used to > >> work that way, until commit 2ffcb18d: > >> > >> Make netdev_del delete the netdev even when it's in use > >> > >> To hot-unplug guest and host part of a network device, you do: > >> > >> device_del NIC-ID > >> netdev_del NETDEV-ID > >> > >> For PCI devices, device_del merely tells ACPI to unplug the device. > >> The device goes away for real only after the guest processed the ACPI > >> unplug event. > >> > >> You have to wait until then (e.g. by polling info pci) before you can > >> unplug the netdev. Not good. > >> > >> Fix by removing the "in use" check from do_netdev_del(). Deleting a > >> netdev while it's in use is safe; packets simply get routed to the bit > >> bucket. > >> > >> Isn't this the very same problem that's behind your drive_unplug? > > > > Yes it is. > > > >> > >> I'd like to have some consistency among net, block and char device > >> commands, i.e. a common set of operations that work the same for all of > >> them. Can we agree on such a set? > > > > Yeah; the current trouble (or at least what I perceive to be trouble) is > > that in the case where the guest responds to device_del induced ACPI > > removal event; the current qdev code already does the host-side device > > tear down. Not sure if it is OK to do a blockdev_del() immediately > > after the device_del. What happens when we do: > > > > device_del > > ACPI to guest > > blockdev_del /* removes host-side device */ > > Fails in my tree, because the blockdev's still in use. See below. > > > guest responds to ACPI > > qdev calls pci device removal code > > qemu attempts to destroy the associated host-side block > > > > That may just work today; and if not, it shouldn't be hard to fix up the > > code to check for NULLs > > I hate the automatic deletion of host part along with the guest part. > device_del should undo device_add. {block,net,char}dev_{add,del} should > be similarly paired.
Agreed. > > In my blockdev branch, I keep the automatic delete only for backwards > compatibility: if you create the drive with drive_add, it gets > auto-deleted, but if you use blockdev_add, it stays around. But what to do about the case where we're doing drive_add and then a device_del() That's the urgent situation that needs to be resolved. > > >> Even if your drive_unplug shouldn't fit in that set, we might want it as > >> a stop-gap. Depends on how urgent the need for it is. Yet another > >> special-purpose command to be deprecated later. > > > > The fix is urgent; but I'm willing to spin a couple patches if it helps > > get this into better shape. > > Can we agree on a common solution for block and net? That's why I cc'ed > Michael. I didn't see a good way to have block behave the same as net; though I do agree that it would be good to have this be common, long term. > > Currently, we have two different ways: > > * The netdev way: "del" always succeeds > > How can it succeed if the host part is in use? > > If all device models are prepared to deal with a missing host part, we > can delete it right away. > > Else, we need to replace it with a suitable zombie, which is > auto-deleted when it goes out of use. Such zombies are not be visible > elsewhere, in particular, the ID becomes available immediately. > > * The unplug way: "del" fails while in use, "unplug" always succeeds > > Feels a bit cleaner to me. But changing netdev_del might not be > acceptable. > > Either way works for me as an user interface. But I'd rather not have > both. > > Next, we need to consider how to integrate this with the automatic > deletion of drives on qdev destruction. That's too late for unplug, we > want that right in device_del. I'd leave the stupid automatic delete > where it is now, in qdev destruction. The C API need unplug and delete > separate for that. > > > Regardless of the way we choose, we need to think very clearly on how > exactly device models should behave when their host part is missing or a > zombie, and how that behavior appears in the guest. > > For net, making it look exactly like a yanked out network cable would > make sense to me. > > What about block? It seems to me that for block it's like cdrom with no disk, floppy with no media, hard disk that's gone bad. I think we we throw EIO back; it's handled gracefully enough. This is what happens when you do a drive_unplug with my patch; the application using the device gets IO errors. That's expected if a drive were to suddently fail (which is what this looks like). And certainly there is some responsibility at the mgmt console to ensure you're not unplugging a drive that you are currently using. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com