On 07/10/2017 03:15 AM, Kashyap Chamarthy wrote: > This patch documents (including their QMP invocations) all the four > major kinds of live block operations: > > - `block-stream` > - `block-commit` > - `drive-mirror` (& `blockdev-mirror`) > - `drive-backup` (& `blockdev-backup`) > > Things considered while writing this document: > > - Use reStructuredText as markup language (with the goal of generating > the HTML output using the Sphinx Documentation Generator). It is > gentler on the eye, and can be trivially converted to different > formats. (Another reason: upstream QEMU is considering to switch to > Sphinx, which uses reStructuredText as its markup language.) > > - Raw QMP JSON output vs. 'qmp-shell'. I debated with myself whether > to only show raw QMP JSON output (as that is the canonical > representation), or use 'qmp-shell', which takes key-value pairs. I > settled on the approach of: for the first occurence of a command,
s/occurence/occurrence/ > use raw JSON; for subsequent occurences, use 'qmp-shell', with an and again > occasional exception. > > - Usage of `-blockdev` command-line. > > - Usage of 'node-name' vs. file path to refer to disks. While we have > `blockdev-{mirror, backup}` as 'node-name'-alternatives for > `drive-{mirror, backup}`, the `block-commit` command still operate s/operate/operates/ > on file names for parameters 'base' and 'top'. So I added a caveat > at the beginning to that effect. > > Refer this related thread that I started (where I learnt > `block-stream` was recently reworked to accept 'node-name' for 'top' > and 'base' parameters): > https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg06466.html > "[RFC] Making 'block-stream', and 'block-commit' accept node-name" > > All commands showed in this document were tested while documenting. > > Thanks: Eric Blake for the section: "A note on points-in-time vs file > names". This useful bit was originally articulated by Eric in his > KVMForum 2015 presentation, so I included that specific bit in this > document. > > Signed-off-by: Kashyap Chamarthy <kcham...@redhat.com> > --- > > diff --git a/docs/interop/live-block-operations.rst > b/docs/interop/live-block-operations.rst > new file mode 100644 > index 0000000..6580f85 > --- /dev/null > +++ b/docs/interop/live-block-operations.rst > @@ -0,0 +1,1088 @@ > +.. > + Copyright (C) 2017 Red Hat Inc. > + > + This work is licensed under the terms of the GNU GPL, version 2 or > + later. See the COPYING file in the top-level directory. Does this paragraph get rendered in such a way that someone reading an .html site will wonder where the top-level directory lives? I'm not sure if it should be a comment local to this file, or if the final rendered text should mention the license. Hmm, reading further, it looks like the '..' followed by indentation serves as a multi-line comment that does not appear in the rendering; so I think that means I have no recommended change. > +Disk image backing chain notation > +--------------------------------- > + > +A simple disk image chain. (This can be created live using QMP > +``blockdev-snapshot-sync``, or offline via ``qemu-img``):: Do we want to go into details about the command-line arguments to qemu-img used for offline creation/manipulation of an image in a chain? I guess it's okay to not worry about it; your focus here is QMP commands (what can we do while qemu is running) rather than offline commands. > + > +Brief overview of live block QMP primitives > +------------------------------------------- > + > +The following are the four different kinds of live block operations that > +QEMU block layer supports. > + > +(1) ``block-stream``: Live copy of data from backing files into overlay > + files. > + > + .. note:: Once the 'stream' operation has finished, three things to > + note: > + > + (a) QEMU rewrites the backing chain to remove > + reference to the now-streamed and redundant backing > + file; > + > + (b) the streamed file *itself* won't be removed by QEMU, > + and must be explicitly discarded by the user; > + > + (c) the streamed file remains valid -- i.e. further > + overlays can be created based on it. Refer the > + ``block-stream`` section further below for more > + details. > + > +(2) ``block-commit``: Live merge of data from overlay files into backing > + files (with the optional goal of removing the overlay file from the > + chain). Since QEMU 2.0, this includes "active ``block-commit``" > + (i.e. merge the current active layer into the base image). > + > + .. note:: Once the 'commit' operation has finished, there are three > + things to note here as well: > + > + (a) QEMU rewrites the backing chain to remove reference > + to now-redundant overlay images that have been > + commited into a backing file; s/commited/committed/ (several places in the document, I'll just point it out here) > + > + (b) the commited file *itself* won't be removed by QEMU > + -- it ought to be manually removed; > + > + (c) however, unlike in the case of ``block-stream``, the > + intermediate images will be rendered invalid -- i.e. > + no more further overlays can be created based on > + them. Refer the ``block-commit`` section further > + below for more details. > + > +(3) ``drive-mirror`` (and ``blockdev-mirror``): Synchronize running disk s/running/a running/ > + to another image. > + > +(4) ``drive-backup`` (and ``blockdev-backup``): Point-in-time (live) copy > + of a block device to a destination. > + > + > +.. _`Interacting with a QEMU instance`: > + > +Interacting with a QEMU instance > +-------------------------------- > + > +To show some example invocations of command-line, we will use the > +following invocation of QEMU, with a QMP server running over UNIX > +socket:: > + > + $ ./x86_64-softmmu/qemu-system-x86_64 -display none -nodefconfig \ > + -M q35 -nodefaults -m 512 \ > + -blockdev > node-name=node-A,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./a.qcow2 > \ > + -device virtio-blk,drive=node-A,id=virtio0 \ > + -monitor stdio -qmp unix:/tmp/qmp-sock,server,nowait > + > +The ``-blockdev`` command-line option, used above, is available from > +QEMU 2.9 onwards. In the above invocation, notice the ``node-name`` > +parameter that is used to refer to the disk image a.qcow2 ('node-A') -- > +this is a cleaner way to refer to a disk image (as opposed to referring > +to it by spelling out file paths). So, we will continue to designate a > +``node-name`` to each further disk image created (either via > +``blockdev-snapshot-sync``, or ``blockdev-add``) as part of the disk > +image chain, and continue to refer to the disks using their > +``node-name`` (where possible, because ``block-commit`` does not yet, as > +of QEMU 2.9, accept ``node-name`` parameter) when performing various > +block operations. > + > +To interact with the QEMU instance launched above, we will use the > +``qmp-shell`` (located at: ``qemu/scripts/qmp``, as part of the QEMU > +source directory) utility, which takes key-value pairs for QMP commands. s/qmp-shell (...) utility/qmp-shell utility (...)/ > +Invoke it as below (which will also print out the complete raw JSON > +syntax for reference -- examples in the following sections):: > + > + $ ./qmp-shell -v -p /tmp/qmp-sock > + (QEMU) > + > +.. note:: > + In the event we have to repeat a certain QMP command, we will: for > + the first occurrence of it, show the ``qmp-shell`` invocation, *and* > + the corresponding raw JSON QMP syntax; but for subsequent > + invocations, present just the ``qmp-shell`` syntax, and omit the > + equivalent JSON output. > + > + > +Example disk image chain > +------------------------ > + > +We will use the below disk image chain (and occasionally spelling it > +out where appropriate) when discussing various primitives:: > + > + [A] <-- [B] <-- [C] <-- [D] > + > +Where [A] is the original base image; [B] and [C] are intermediate > +overlay images; image [D] is the active layer -- i.e. live QEMU is > +writing to it. (The rule of thumb is: live QEMU will always be pointing > +to the rightmost image in a disk image chain.) > + > +The above image chain can be created by invoking > +``blockdev-snapshot-sync`` commands as following (which shows the > +creation of overlay image [B]) using the ``qmp-shell`` (our invocation > +also prints the raw JSON invocation of it):: > + > + (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 > snapshot-node-name=node-B format=qcow2 > + { > + "execute": "blockdev-snapshot-sync", > + "arguments": { > + "node-name": "node-A", > + "snapshot-file": "b.qcow2", > + "format": "qcow2", > + "snapshot-node-name": "node-B" > + } > + } > + > +Here, "node-A" is the name QEMU internally uses to refer to the base > +image [A] -- it is the backing file, based on which the overlay image, > +[B], is created. > + > +To create the rest of the overlay images, [C], and [D] (omitted the raw s/omitted/omitting/ > +JSON output for brevity):: > + > + (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 > snapshot-node-name=node-C format=qcow2 > + (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 > snapshot-node-name=node-D format=qcow2 > + > +QMP invocation for ``block-commit`` > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +For :ref:`Case-1 <block-commit_Case-1>`, to merge contents only from > +image [B] into image [A], the invocation is as following:: s/following/follows/ > + > + (QEMU) block-commit device=node-D base=a.qcow2 top=b.qcow2 job-id=job0 > + { > + "execute": "block-commit", > + "arguments": { > + "device": "node-D", > + "job-id": "job0", > + "top": "b.qcow2", > + "base": "a.qcow2" > + } > + } > + > +Once the above ``block-commit`` operation has completed, a > +``BLOCK_JOB_COMPLETED`` event will be issued, and no further action is > +required. The end result being, the backing file of image [C] is Comes off awkwardly to me, but I'm debating on the best fix. Perhaps: s/The end result being,/As the end result,/ > +adjusted to point to image [A], and the original 4-image chain will end > +up being transformed to:: > + > + > +Live disk synchronization --- ``drive-mirror`` and ``blockdev-mirror`` > +---------------------------------------------------------------------- > + > +Synchronize a running disk image chain (all or part of it) to a target > +image. > + > +Again, given our familiar disk image chain:: > + > + [A] <-- [B] <-- [C] <-- [D] > + > +The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``) allows > +you to copy data from the entire chain into a single target image (which > +can be located on a different host). > + > +Once a 'mirror' job has started, there are two possible actions when a maybe s/when/while/ > +``drive-mirror`` job is active: > + > +(1) Issuing the command ``block-job-cancel`` after it emits the event > + ``BLOCK_JOB_CANCELLED``: will (after completing synchronization of > + the content from the disk image chain to the target image, [E]) > + create a point-in-time (which is at the time of *triggering* the > + cancel command) copy, contained in image [E], of the the entire disk > + image chain (or only the top-most image, depending on the ``sync`` > + mode). > + > +(2) Issuing the command ``block-job-complete`` after it emits the event > + ``BLOCK_JOB_COMPLETED``: will, after completing synchronization of > + the content, adjust the guest device (i.e. live QEMU) to point to > + the target image, and, causing all the new writes from this point on > + to happen there. One use case for this is live storage migration. > + > +About synchronization modes: The synchronization mode determines > +*which* part of the disk image chain will be copied to the target. > +Currently, there are four different kinds: > + > +(1) ``full`` -- Synchronize the content of entire disk image chain to > + the target > + > +(2) ``top`` -- Synchronize only the contents of the top-most disk image > + in the chain to the target > + > +(3) ``none`` -- Synchronize only the new writes from this point on. > + > + .. note:: In the case of ``drive-backup`` (or ``blockdev-backup``), > + the behavior of ``none`` sychronization mode is different. s/sychronization/synchronization/ > + Normally, a ``backup`` job consists of two parts: Anything > + that is overwritten by the guest is first copied out to > + the backup, and in the background the whole image is > + copied from start to end. With ``sync=none``, it's only > + the first part. > + > +(4) ``incremental`` -- Synchronize content that is described by the > + dirty bitmap > + > +.. note:: > + Refer to the :doc:`bitmaps` document in the QEMU source > + tree to learn about the detailed workings of the ``incremental`` > + synchronization mode. > + > + > +QMP invocation for ``drive-mirror`` > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +.. important:: > + The destination host must already have the contents of the backing > + chain, involving images [A], [B], and [C], visible via other means > + -- whether by ``cp``, ``rsync``, or by some storage array-specific > + command.) > + > +Sometimes, this is also referred to as "shallow copy" -- because: only s/because:/because/ > +the "active layer", and not the rest of the image chain, is copied to > +the destination. > + > +.. note:: > + In this example, for the sake of simplicity, we'll be using the same > + ``localhost`` as both, source and destination. s/both,/both/ > + > +As noted earlier, on the destination host the contents of the backing > +chain -- from images [A] to [C] -- are already expected to exist in some > +form (e.g. in a file called, ``Contents-of-A-B-C.qcow2``). Now, on the > +destination host, let's create a target overlay image (with the image > +``Contents-of-A-B-C.qcow2`` as its backing file), to which the contents > +of image [D] (from the source QEMU) will be mirrored to:: > + > + $ qemu-img create -f qcow2 -b ./Contents-of-A-B-C.qcow2 \ > + -F qcow2 ./target-disk.qcow2 Ah, so you DO have one example of an offline use of qemu-img for manipulating backing chain relationships. > + > +And start the destination QEMU (we already have the source QEMU running > +-- discussed in the section: `Interacting with a QEMU instance`_) > +instance, with the following invocation. (As noted earlier, for > +simplicity's sake, the destination QEMU is started on the same host, but > +it could be located elsewhere):: libvirt doesn't allow migration to localhost - but that doesn't affect your example... > +(6) [On *destination* QEMU] Finally, resume the guest vCPUs by issuing the > + QMP command `cont`:: > + > + (QEMU) cont > + { > + "execute": "cont", > + "arguments": {} > + } > + > + > +.. note:: > + Higher-level libraries (e.g. libvirt) automate the entire above > + process. ...other than this note. Maybe s/process./process (although note that libvirt does not allow same-host migrations to localhost for other reasons). Overall, looking good! Content-wise, I think we have a good document, and it was just a few spelling errors and grammar suggestions, minor enough that I'm comfortable with you adding: Reviewed-by: Eric Blake <ebl...@redhat.com> -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature