----- Messaggio originale ----- > Da: "Avi Kivity" <a...@redhat.com> > A: "Paolo Bonzini" <pbonz...@redhat.com> > Cc: "Liu Ping Fan" <pingf...@linux.vnet.ibm.com>, qemu-devel@nongnu.org, > "Anthony Liguori" <anth...@codemonkey.ws>, > "Marcelo Tosatti" <mtosa...@redhat.com>, "Jan Kiszka" > <jan.kis...@siemens.com>, "Stefan Hajnoczi" > <stefa...@gmail.com> > Inviato: Giovedì, 25 ottobre 2012 18:28:27 > Oggetto: Re: [patch v4 05/16] memory: introduce ref,unref interface for > MemoryRegionOps > > On 10/24/2012 09:29 AM, Paolo Bonzini wrote: > > Il 23/10/2012 18:09, Avi Kivity ha scritto: > >>> But our interfaces had better support asynchronicity, and indeed > >>> they > >>> do: after you write to the "eject" register, the "up" will show > >>> the > >>> device as present until after destroy is done. This can be > >>> changed to > >>> show the device as present only until after step 4 is done. > >> > >> Let's say we want to eject the hotplug hardware itself (just as an > >> example). With refcounts, the callback that updates "up" will hold > >> on to to it via refcounts. With stop_machine(), you need to cancel > >> that callback, or wait for it somehow, or it can arrive after the > >> stop_machine() and bite you. > > > > The callback that updates "up" is for the parent of the hotplug > > hardware. There is nothing that has to be updated in the hotplug > > hardware itself. > > I meant, as an unrealistic example, hot-unplugging the bridge itself. > So we have a callback that updates information in the bridge (up > register state) being called asynchronously. > > A more realistic example would be hot-unplug of an HBA, then the block > layer callback comes back to update the device. So stop_machine() > would need to cancel all I/O and wait for I/O that cannot be cancelled.
Cancellation+wait would be triggered by isolate (4a) and it would run outside stop_machine(). We know that stop_machine() will eventually run because the guest cannot place more requests for the devices to process. At this point we're here: > > 4a. close all backends (also cancel or complete all pending I/O) > > ^ long latency > but none of this is done in stop_machine(). Once cancellation/wait finishes, the HBA gives a green-light to the parent, which proceeds as follows: > > 4b. notify parent that we're done > > 4ba. parent removes device from its bus > > 4bb. parent notifies guest > > 4bc. parent schedules stop_machine(qdev_free(child)) > > 5. a bottom half calls stop_machine(qdev_free(child)) All we're doing in stop_machine() is really calling the destructor, which---in an isolate-enabled device---only includes calls to qemu_del_timer, drive_put_ref, memory_region_destroy and the like. > Maybe my worry about long stop_machine latencies is premature. > Everyone in the kernel hates it, but the kernel scales a lot more > than qemu and is in a much better place wrt threading. stop_machine may indeed require (or at least warmly suggest) a conversion to isolate of storage devices, in order to reduce the latency of the destructor. We do not have that many though (the IDE and SCSI buses, and virtio-blk). Paolo