On Wed, Jun 13, 2012 at 11:10 PM, Eric Blake <ebl...@redhat.com> wrote: > On 06/13/2012 08:36 AM, Dong Xu Wang wrote: >> Introduce a new file format:add-cow. The usage can be found at this patch. >> >> Signed-off-by: Dong Xu Wang <wdon...@linux.vnet.ibm.com> >> --- >> docs/specs/add-cow.txt | 87 >> ++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 files changed, 87 insertions(+), 0 deletions(-) >> create mode 100644 docs/specs/add-cow.txt >> >> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt >> new file mode 100644 >> index 0000000..e077fc2 >> --- /dev/null >> +++ b/docs/specs/add-cow.txt >> @@ -0,0 +1,87 @@ >> +== General == >> + >> +Raw file format does not support backing_file and copy on write feature. >> +The add-cow image format makes it possible to use backing files with raw >> +image by keeping a separate .add-cow metadata file. Once all sectors >> +have been written into the raw image it is safe to discard the .add-cow >> +and backing files, then we can use the raw image directly. >> + >> +While using add-cow, procedures may like this: >> +(ubuntu.img is a disk image which has been installed OS.) >> + 1) Create a raw image with the same size of ubuntu.img >> + qemu-img create -f raw test.raw 8G > > Make sure we also support a raw file larger than the backing file. >
Okay, in this version, I just truncate raw file with the same size of backing_file, I will correct this in v11. > Does it matter whether the raw file starts life sparse? No, it can work with sparse raw file. > >> + 2) Create an add-cow image which will store dirty bitmap >> + qemu-img create -f add-cow test.add-cow \ >> + -o backing_file=ubuntu.img,image_file=test.raw >> + 3) Run qemu with add-cow image >> + qemu -drive if=virtio,file=test.add-cow > > How does this interact with live snapshots/live disk mirroring? Is this > something where I have to call 'block-stream' to pull data into the new > raw file on-demand? I take it that test.add-cow is required until the > block-stream completes? Is there a way, while qemu is still running, > but after the block-stream is complete, to reassociate the drive with > the actual raw file instead of the add-cow, or does the add-cow have to > remain around as long as the qemu process is still running? > I did not touch snapshots/live disk mirroring code much, so have to call "block_stream". Now, add-cow has to remain until qemu process is still running, I did not implement a way to use raw file directly after the block-stream is complete. >> + >> +=Specification= >> + >> +The file format looks like this: >> + >> + +---------------+-------------+-----------------+ >> + | Header | Reserved | COW bitmap | >> + +---------------+-------------+-----------------+ >> + >> +All numbers in add-cow are stored in Little Endian byte order. > > Okay, but different than network byte order, which means we can't use > htonl() and friends to convert host bytes into the proper format. > <endian.h> is not (yet) standardized, although there is talk of adding > it to the next version of POSIX, in which case htole32() and friends > would be guaranteed. > >> + >> +== Header == >> + >> +The Header is included in the first bytes: >> + >> + Byte 0 - 7: magic >> + add-cow magic string ("ADD_COW\xff") >> + >> + 8 - 11: version >> + Version number (only valid value is 1 now) >> + >> + 12 - 15: backing_filename_offset >> + Offset in the add-cow file at which the backing >> file name >> + is stored (NB: The string is not null terminated). >> 0 if the >> + image doesn't have a backing file. > > Mention that if this is not 0, then it must be between 36 and 4094 (a > file name must be at least 1 byte). What are the semantics if the > filename is relative? relative filename is ok, I tested it just now. > >> + >> + 16 - 19: backing_filename_size >> + Length of the backing file name in bytes. Undefined >> if the >> + image doesn't have a backing file. > > Better to require 0 if backing_filename_offset is 0, than to leave this > field undefined; also if backing_filename_offset is non-zero, then this > must be non-zero. Must be less than 4096-36 to fit in the reserved part > of the header. > Okay. >> + >> + 20 - 23: image_filename_offset >> + Offset in the add-cow file at which the image_file >> name >> + is stored (NB: The string is not null terminated). > > Mention that this must be between 36 and 4094 (a file name must be at > least 1 byte). What are the semantics if the filename is relative? relative filename is ok, I tested it just now. > >> + >> + 24 - 27: image_filename_size >> + Length of the image_file name in bytes. > > If backing_filename_offset is non-zero, then this must be non-zero. > Must be less than 4096-36 to fit in the reserved part of the header. > Yes, > May image_filename and backing_filename overlap (possible if one is a > suffix of the other)? Are there any constraints to prevent infinite > loops, such as forbidding backing_filename and image_filename from > resolving either to the same file or to the add-cow file? > Sorry, I should add the code that judge if image_file is valid. will fix. >> + >> + 28 - 35: features >> + Currently only 2 feature bits are used: >> + Feature bits: >> + The image uses a backing file: >> + * ADD_COW_F_BACKING_FILE = 0x01. >> + The backing file's format is raw: >> + * ADD_COW_F_BACKING_FORMAT_NO_PROBE = 0x02. > > Should this follow the qcow2v3 proposal of splitting into mandatory vs. > optional feature bits? > > I agree that ADD_COW_F_BACKING_FORMAT_NO_PROBE is sufficient to avoid > security implications, but do we want the extra flexibility of > specifying the backing format file format rather than just requiring > probes on all but raw? Kevin, or Stefan, can you give some comments for this? thanks. > >> + >> +== Reserved == >> + >> + Byte 36 - 4095: Reserved field: >> + It is used to make sure COW bitmap field starts at >> the >> + 4096th byte, backing_file name and image_file name >> will >> + be stored here. > > Do we want to keep a fixed-size header, or should we be planning on the > possibility of future extensions requiring enough other header > extensions that a variable-sized header would be wiser? That is, I'm > fine with requiring that the header be a multiple of 4k, but maybe it > would make sense to have a mandatory header field that states how many > header pages are present before the COW bitmap begins. In the first > round of implementation, this header field can be required to be 1 (that > is, for now, we require exactly 4k header), but having the field would > let us change in the future to a design with an 8k header to hold more > metadata as needed. > Okay. >> + >> +== COW bitmap == >> + >> +The "COW bitmap" field starts at the 4096th byte, stores a bitmap related to >> +backing_file and image_file. The bitmap will track whether the sector in >> +backing_file is dirty or not. >> + >> +Each bit in the bitmap indicates one cluster's status. One cluster includes >> 128 >> +sectors, then each bit indicates 512 * 128 = 64k bytes, So the size of >> bitmap is >> +calculated according to virtual size of image_file. In each byte, bit 0 to 7 >> +will track the 1st to 7th cluster in sequence, bit orders in one byte look >> like: >> + +----+----+----+----+----+----+----+----+ >> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 | >> + +----+----+----+----+----+----+----+----+ >> + >> +If the bit is 0, indicates the sector has not been allocated in image_file, >> data >> +should be loaded from backing_file while reading; if the bit is 1, >> indicates the >> +related sector has been dirty, should be loaded from image_file while >> reading. >> +Writing to a sector causes the corresponding bit to be set to 1. > > So basically an add-cow image is thin as long as at least one bit is 0, > and the add-cow wrapper can only be discarded when all bits are 1. > > How do you handle the case where the raw image is not an even multiple > of cluster bytes? That is, do bits that correspond to bytes beyond the > raw file size have to be in a certain state? > Now, bits correspond to bytes beyond the raw file size are set to 0. Really thanks for your reviewing, Eric. > -- > Eric Blake ebl...@redhat.com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org >