Signed-off-by: Anthony Liguori <aligu...@us.ibm.com> --- docs/specs/qcow2.md | 263 ++++++++++++++++++++++++++++++++++++++++++++++++++ docs/specs/qcow2.txt | 260 ------------------------------------------------- 2 files changed, 263 insertions(+), 260 deletions(-) create mode 100644 docs/specs/qcow2.md delete mode 100644 docs/specs/qcow2.txt
diff --git a/docs/specs/qcow2.md b/docs/specs/qcow2.md new file mode 100644 index 0000000..b9d6cdd --- /dev/null +++ b/docs/specs/qcow2.md @@ -0,0 +1,263 @@ +General +======= + +A qcow2 image file is organized in units of constant size, which are called +(host) clusters. A cluster is the unit in which all allocations are done, +both for actual guest data and for image metadata. + +Likewise, the virtual disk as seen by the guest is divided into (guest) +clusters of the same size. + +All numbers in qcow2 are stored in Big Endian byte order. + +Header +====== + +The first cluster of a qcow2 image contains the file header: + + Byte 0 - 3: magic + QCOW magic string ("QFI\xfb") + + 4 - 7: version + Version number (only valid value is 2) + + 8 - 15: backing_file_offset + Offset into the image file at which the backing file name + is stored (NB: The string is not null terminated). 0 if the + image doesn't have a backing file. + + 16 - 19: backing_file_size + Length of the backing file name in bytes. Must not be + longer than 1023 bytes. Undefined if the image doesn't have + a backing file. + + 20 - 23: cluster_bits + Number of bits that are used for addressing an offset + within a cluster (1 << cluster_bits is the cluster size). + Must not be less than 9 (i.e. 512 byte clusters). + + Note: qemu as of today has an implementation limit of 2 MB + as the maximum cluster size and won't be able to open images + with larger cluster sizes. + + 24 - 31: size + Virtual disk size in bytes + + 32 - 35: crypt_method + 0 for no encryption + 1 for AES encryption + + 36 - 39: l1_size + Number of entries in the active L1 table + + 40 - 47: l1_table_offset + Offset into the image file at which the active L1 table + starts. Must be aligned to a cluster boundary. + + 48 - 55: refcount_table_offset + Offset into the image file at which the refcount table + starts. Must be aligned to a cluster boundary. + + 56 - 59: refcount_table_clusters + Number of clusters that the refcount table occupies + + 60 - 63: nb_snapshots + Number of snapshots contained in the image + + 64 - 71: snapshots_offset + Offset into the image file at which the snapshot table + starts. Must be aligned to a cluster boundary. + +Directly after the image header, optional sections called header extensions can +be stored. Each extension has a structure like the following: + + Byte 0 - 3: Header extension type: + 0x00000000 - End of the header extension area + 0xE2792ACA - Backing file format name + other - Unknown header extension, can be safely + ignored + + 4 - 7: Length of the header extension data + + 8 - n: Header extension data + + n - m: Padding to round up the header extension size to the next + multiple of 8. + +The remaining space between the end of the header extension area and the end of +the first cluster can be used for other data. Usually, the backing file name is +stored there. + + +Host cluster management +======================= + +qcow2 manages the allocation of host clusters by maintaining a reference count +for each host cluster. A refcount of 0 means that the cluster is free, 1 means +that it is used, and >= 2 means that it is used and any write access must +perform a COW (copy on write) operation. + +The refcounts are managed in a two-level table. The first level is called +refcount table and has a variable size (which is stored in the header). The +refcount table can cover multiple clusters, however it needs to be contiguous +in the image file. + +It contains pointers to the second level structures which are called refcount +blocks and are exactly one cluster in size. + +Given a offset into the image file, the refcount of its cluster can be obtained +as follows: + + refcount_block_entries = (cluster_size / sizeof(uint16_t)) + + refcount_block_index = (offset / cluster_size) % refcount_block_entries + refcount_table_index = (offset / cluster_size) / refcount_block_entries + + refcount_block = load_cluster(refcount_table[refcount_table_index]); + return refcount_block[refcount_block_index]; + +Refcount table entry: + + Bit 0 - 8: Reserved (set to 0) + + 9 - 63: Bits 9-63 of the offset into the image file at which the + refcount block starts. Must be aligned to a cluster + boundary. + + If this is 0, the corresponding refcount block has not yet + been allocated. All refcounts managed by this refcount block + are 0. + +Refcount block entry: + + Bit 0 - 15: Reference count of the cluster + + +Cluster mapping +=============== + +Just as for refcounts, qcow2 uses a two-level structure for the mapping of +guest clusters to host clusters. They are called L1 and L2 table. + +The L1 table has a variable size (stored in the header) and may use multiple +clusters, however it must be contiguous in the image file. L2 tables are +exactly one cluster in size. + +Given a offset into the virtual disk, the offset into the image file can be +obtained as follows: + + l2_entries = (cluster_size / sizeof(uint64_t)) + + l2_index = (offset / cluster_size) % l2_entries + l1_index = (offset / cluster_size) / l2_entries + + l2_table = load_cluster(l1_table[l1_index]); + cluster_offset = l2_table[l2_index]; + + return cluster_offset + (offset % cluster_size) + +L1 table entry: + + Bit 0 - 8: Reserved (set to 0) + + 9 - 55: Bits 9-55 of the offset into the image file at which the L2 + table starts. Must be aligned to a cluster boundary. If the + offset is 0, the L2 table and all clusters described by this + L2 table are unallocated. + + 56 - 62: Reserved (set to 0) + + 63: 0 for an L2 table that is unused or requires COW, 1 if its + refcount is exactly one. This information is only accurate + in the active L1 table. + +L2 table entry (for normal clusters): + + Bit 0 - 8: Reserved (set to 0) + + 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a + cluster boundary. If the offset is 0, the cluster is + unallocated. + + 56 - 61: Reserved (set to 0) + + 62: 0 (this cluster is not compressed) + + 63: 0 for a cluster that is unused or requires COW, 1 if its + refcount is exactly one. This information is only accurate + in L2 tables that are reachable from the the active L1 + table. + +L2 table entry (for compressed clusters; x = 62 - (cluster_size - 8)): + + Bit 0 - x: Host cluster offset. This is usually _not_ aligned to a + cluster boundary! + + x+1 - 61: Compressed size of the images in sectors of 512 bytes + + 62: 1 (this cluster is compressed using zlib) + + 63: 0 for a cluster that is unused or requires COW, 1 if its + refcount is exactly one. This information is only accurate + in L2 tables that are reachable from the the active L1 + table. + +If a cluster is unallocated, read requests shall read the data from the backing +file. If there is no backing file or the backing file is smaller than the image, +they shall read zeros for all parts that are not covered by the backing file. + +Snapshots +========= + +qcow2 supports internal snapshots. Their basic principle of operation is to +switch the active L1 table, so that a different set of host clusters are +exposed to the guest. + +When creating a snapshot, the L1 table should be copied and the refcount of all +L2 tables and clusters reachable from this L1 table must be increased, so that +a write causes a COW and isn't visible in other snapshots. + +When loading a snapshot, bit 63 of all entries in the new active L1 table and +all L2 tables referenced by it must be reconstructed from the refcount table +as it doesn't need to be accurate in inactive L1 tables. + +A directory of all snapshots is stored in the snapshot table, a contiguous area +in the image file, whose starting offset and length are given by the header +fields snapshots_offset and nb_snapshots. The entries of the snapshot table +have variable length, depending on the length of ID, name and extra data. + +Snapshot table entry: + + Byte 0 - 7: Offset into the image file at which the L1 table for the + snapshot starts. Must be aligned to a cluster boundary. + + 8 - 11: Number of entries in the L1 table of the snapshots + + 12 - 13: Length of the unique ID string describing the snapshot + + 14 - 15: Length of the name of the snapshot + + 16 - 19: Time at which the snapshot was taken in seconds since the + Epoch + + 20 - 23: Subsecond part of the time at which the snapshot was taken + in nanoseconds + + 24 - 31: Time that the guest was running until the snapshot was + taken in nanoseconds + + 32 - 35: Size of the VM state in bytes. 0 if no VM state is saved. + If there is VM state, it starts at the first cluster + described by first L1 table entry that doesn't describe a + regular guest cluster (i.e. VM state is stored like guest + disk content, except that it is stored at offsets that are + larger than the virtual disk presented to the guest) + + 36 - 39: Size of extra data in the table entry (used for future + extensions of the format) + + variable: Extra data for future extensions. Must be ignored. + + variable: Unique ID string for the snapshot (not null terminated) + + variable: Name of the snapshot (not null terminated) diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt deleted file mode 100644 index e792953..0000000 --- a/docs/specs/qcow2.txt +++ /dev/null @@ -1,260 +0,0 @@ -== General == - -A qcow2 image file is organized in units of constant size, which are called -(host) clusters. A cluster is the unit in which all allocations are done, -both for actual guest data and for image metadata. - -Likewise, the virtual disk as seen by the guest is divided into (guest) -clusters of the same size. - -All numbers in qcow2 are stored in Big Endian byte order. - - -== Header == - -The first cluster of a qcow2 image contains the file header: - - Byte 0 - 3: magic - QCOW magic string ("QFI\xfb") - - 4 - 7: version - Version number (only valid value is 2) - - 8 - 15: backing_file_offset - Offset into the image file at which the backing file name - is stored (NB: The string is not null terminated). 0 if the - image doesn't have a backing file. - - 16 - 19: backing_file_size - Length of the backing file name in bytes. Must not be - longer than 1023 bytes. Undefined if the image doesn't have - a backing file. - - 20 - 23: cluster_bits - Number of bits that are used for addressing an offset - within a cluster (1 << cluster_bits is the cluster size). - Must not be less than 9 (i.e. 512 byte clusters). - - Note: qemu as of today has an implementation limit of 2 MB - as the maximum cluster size and won't be able to open images - with larger cluster sizes. - - 24 - 31: size - Virtual disk size in bytes - - 32 - 35: crypt_method - 0 for no encryption - 1 for AES encryption - - 36 - 39: l1_size - Number of entries in the active L1 table - - 40 - 47: l1_table_offset - Offset into the image file at which the active L1 table - starts. Must be aligned to a cluster boundary. - - 48 - 55: refcount_table_offset - Offset into the image file at which the refcount table - starts. Must be aligned to a cluster boundary. - - 56 - 59: refcount_table_clusters - Number of clusters that the refcount table occupies - - 60 - 63: nb_snapshots - Number of snapshots contained in the image - - 64 - 71: snapshots_offset - Offset into the image file at which the snapshot table - starts. Must be aligned to a cluster boundary. - -Directly after the image header, optional sections called header extensions can -be stored. Each extension has a structure like the following: - - Byte 0 - 3: Header extension type: - 0x00000000 - End of the header extension area - 0xE2792ACA - Backing file format name - other - Unknown header extension, can be safely - ignored - - 4 - 7: Length of the header extension data - - 8 - n: Header extension data - - n - m: Padding to round up the header extension size to the next - multiple of 8. - -The remaining space between the end of the header extension area and the end of -the first cluster can be used for other data. Usually, the backing file name is -stored there. - - -== Host cluster management == - -qcow2 manages the allocation of host clusters by maintaining a reference count -for each host cluster. A refcount of 0 means that the cluster is free, 1 means -that it is used, and >= 2 means that it is used and any write access must -perform a COW (copy on write) operation. - -The refcounts are managed in a two-level table. The first level is called -refcount table and has a variable size (which is stored in the header). The -refcount table can cover multiple clusters, however it needs to be contiguous -in the image file. - -It contains pointers to the second level structures which are called refcount -blocks and are exactly one cluster in size. - -Given a offset into the image file, the refcount of its cluster can be obtained -as follows: - - refcount_block_entries = (cluster_size / sizeof(uint16_t)) - - refcount_block_index = (offset / cluster_size) % refcount_block_entries - refcount_table_index = (offset / cluster_size) / refcount_block_entries - - refcount_block = load_cluster(refcount_table[refcount_table_index]); - return refcount_block[refcount_block_index]; - -Refcount table entry: - - Bit 0 - 8: Reserved (set to 0) - - 9 - 63: Bits 9-63 of the offset into the image file at which the - refcount block starts. Must be aligned to a cluster - boundary. - - If this is 0, the corresponding refcount block has not yet - been allocated. All refcounts managed by this refcount block - are 0. - -Refcount block entry: - - Bit 0 - 15: Reference count of the cluster - - -== Cluster mapping == - -Just as for refcounts, qcow2 uses a two-level structure for the mapping of -guest clusters to host clusters. They are called L1 and L2 table. - -The L1 table has a variable size (stored in the header) and may use multiple -clusters, however it must be contiguous in the image file. L2 tables are -exactly one cluster in size. - -Given a offset into the virtual disk, the offset into the image file can be -obtained as follows: - - l2_entries = (cluster_size / sizeof(uint64_t)) - - l2_index = (offset / cluster_size) % l2_entries - l1_index = (offset / cluster_size) / l2_entries - - l2_table = load_cluster(l1_table[l1_index]); - cluster_offset = l2_table[l2_index]; - - return cluster_offset + (offset % cluster_size) - -L1 table entry: - - Bit 0 - 8: Reserved (set to 0) - - 9 - 55: Bits 9-55 of the offset into the image file at which the L2 - table starts. Must be aligned to a cluster boundary. If the - offset is 0, the L2 table and all clusters described by this - L2 table are unallocated. - - 56 - 62: Reserved (set to 0) - - 63: 0 for an L2 table that is unused or requires COW, 1 if its - refcount is exactly one. This information is only accurate - in the active L1 table. - -L2 table entry (for normal clusters): - - Bit 0 - 8: Reserved (set to 0) - - 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a - cluster boundary. If the offset is 0, the cluster is - unallocated. - - 56 - 61: Reserved (set to 0) - - 62: 0 (this cluster is not compressed) - - 63: 0 for a cluster that is unused or requires COW, 1 if its - refcount is exactly one. This information is only accurate - in L2 tables that are reachable from the the active L1 - table. - -L2 table entry (for compressed clusters; x = 62 - (cluster_size - 8)): - - Bit 0 - x: Host cluster offset. This is usually _not_ aligned to a - cluster boundary! - - x+1 - 61: Compressed size of the images in sectors of 512 bytes - - 62: 1 (this cluster is compressed using zlib) - - 63: 0 for a cluster that is unused or requires COW, 1 if its - refcount is exactly one. This information is only accurate - in L2 tables that are reachable from the the active L1 - table. - -If a cluster is unallocated, read requests shall read the data from the backing -file. If there is no backing file or the backing file is smaller than the image, -they shall read zeros for all parts that are not covered by the backing file. - - -== Snapshots == - -qcow2 supports internal snapshots. Their basic principle of operation is to -switch the active L1 table, so that a different set of host clusters are -exposed to the guest. - -When creating a snapshot, the L1 table should be copied and the refcount of all -L2 tables and clusters reachable from this L1 table must be increased, so that -a write causes a COW and isn't visible in other snapshots. - -When loading a snapshot, bit 63 of all entries in the new active L1 table and -all L2 tables referenced by it must be reconstructed from the refcount table -as it doesn't need to be accurate in inactive L1 tables. - -A directory of all snapshots is stored in the snapshot table, a contiguous area -in the image file, whose starting offset and length are given by the header -fields snapshots_offset and nb_snapshots. The entries of the snapshot table -have variable length, depending on the length of ID, name and extra data. - -Snapshot table entry: - - Byte 0 - 7: Offset into the image file at which the L1 table for the - snapshot starts. Must be aligned to a cluster boundary. - - 8 - 11: Number of entries in the L1 table of the snapshots - - 12 - 13: Length of the unique ID string describing the snapshot - - 14 - 15: Length of the name of the snapshot - - 16 - 19: Time at which the snapshot was taken in seconds since the - Epoch - - 20 - 23: Subsecond part of the time at which the snapshot was taken - in nanoseconds - - 24 - 31: Time that the guest was running until the snapshot was - taken in nanoseconds - - 32 - 35: Size of the VM state in bytes. 0 if no VM state is saved. - If there is VM state, it starts at the first cluster - described by first L1 table entry that doesn't describe a - regular guest cluster (i.e. VM state is stored like guest - disk content, except that it is stored at offsets that are - larger than the virtual disk presented to the guest) - - 36 - 39: Size of extra data in the table entry (used for future - extensions of the format) - - variable: Extra data for future extensions. Must be ignored. - - variable: Unique ID string for the snapshot (not null terminated) - - variable: Name of the snapshot (not null terminated) -- 1.7.4.1