On Fri, Nov 03, 2017 at 10:04:57AM +0000, Stefan Hajnoczi wrote: > On Thu, Nov 02, 2017 at 12:50:39PM -0500, Eric Blake wrote: > > On 11/02/2017 12:04 PM, Daniel P. Berrange wrote: > > > > > vm-a-disk1.qcow2 open - its just a regular backing file setup. > > > > > >> > > >>> | (format=qcow2, proto=file) > > >>> | > > >>> +- vm-a-disk1.qcow2 (qemu-system-XXX) > > >>> > > >>> The problem is that many VMs are wanting to use cache-disk1.qcow2 as > > >>> their disk's backing file, and only one process is permitted to be > > >>> writing to disk backing file at any time. > > >> > > >> Can you explain a bit more about how many VMs are trying to write to > > >> write to the same backing file 'cache-disk1.qcow2'? I'd assume it's > > >> just the "immutable" local backing store (once the previous 'mirror' job > > >> is completed), based on which Nova creates a qcow2 overlay for each > > >> instance it boots. > > > > > > An arbitrary number of vm-*-disk1.qcow2 files could exist all using > > > the same cache-disk1.qcow2 image. Its only limited by how many VMs > > > you can fit on the host. By definition you can only ever have a single > > > process writing to a qcow2 file though, otherwise corruption will quickly > > > follow. > > > > So if I'm following, your argument is that the local qemu-nbd process is > > the only one writing to the file, while all other overlays are backed by > > the NBD process; and then as any one of the VMs reads, the qemu-nbd > > process pulls those sectors into the local storage as a result. > > > > > > > >> When I pointed this e-mail of yours to Matt Booth on Freenode Nova IRC > > >> channel, he said the intermediate image (cache-disk1.qcow2) is a COR > > >> Copy-On-Read). I realize what COR is -- everytime you read a cluster > > >> from the backing file, you write that locally, to avoid reading it > > >> again. > > > > > > qcow2 doesn't give you COR, only COW. So every read request would have a > > > miss > > > in cache-disk1.qcow2 and thus have to be fetched from master-disk1.qcow2. > > > The > > > use of drive-mirror to pull master-disk1.qcow2 contents into > > > cache-disk1.qcow > > > makes up for the lack of COR by populating cache-disk1.qcow2 in the > > > background. > > > > Ah, but qcow2 (or more precisely, any protocol qemu BDS) DOES have > > copy-on-read, built in to the block layer. See qemu-iotest 197 for an > > example of it in use. If we use COR correctly, then every initial read > > request will miss in the cache, but the COR will populate the cache > > without having to have a background drive-mirror. A background > > drive-mirror may still be useful to populate the cache faster, but COR > > populates the parts you want now regardless of how fast the background > > task is running. > > -drive copy-on-read=on and the stream block job were added exactly for > this provisioning use case. They can be used together. > > I was a little surprised that the discussion has been about the mirror > job rather than the stream job. > > One difference between stream and mirror is that stream doesn't pivot > the image file on completion. Instead it clears the backing file so the > link to the remote server no longer exists.
The confusion between 'stream' and 'mirror' is entirely my lack of understanding. Just substitute whichever makes sense :-) Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|