On Sun, Jun 30, 2024 at 5:31 PM Nir Soffer <[email protected]> wrote:
>
> I found a strange behavior in qemu-img map - zero/data status depends on page
> cache content.  It looks like a kernel issue since qemu-img map is using
> SEEK_HOLE/DATA (block/file-posix.c line 3111).
>
> Tested with latest qemu on kernel 6.9.6-100.fc39.x86_64. I see similar 
> behavior
> in xfs and ex4 filesystems.
>
> After creating a allocated image:
>
>     # qemu-img create -f raw -o preallocation=falloc falloc.img 1g
>     Formatting 'falloc.img', fmt=raw size=1073741824 preallocation=falloc
>
> qemu-img map reports the image as sparse (expect the first block which we 
> fully
> allocate):
>
>     # qemu-img map --output json falloc.img
>     [{ "start": 0, "length": 4096, "depth": 0, "present": true,
> "zero": false, "data": true, "offset": 0},
>     { "start": 4096, "length": 1073737728, "depth": 0, "present":
> true, "zero": true, "data": false, "offset": 4096}]
>
> This is goo for copy or read performance, since we can skip reading the areas
> with data=false, but on the other hand this is bad for correctness, since we
> cannot preserve the allocation of the entire image, since it look like a 
> sparse
> image:
>
>     # qemu-img create -f raw sparse.img 1g
>     Formatting 'sparse.img', fmt=raw size=1073741824
>
>     # qemu-img map --output json sparse.img
>     [{ "start": 0, "length": 4096, "depth": 0, "present": true,
> "zero": false, "data": true, "offset": 0},
>     { "start": 4096, "length": 1073737728, "depth": 0, "present":
> true, "zero": true, "data": false, "offset": 4096}]
>
> But look what happens when we get some of the image into the page cache:
>
>     # dd if=falloc.img bs=1M count=512 of=/dev/null
>
>     # qemu-img map --output json falloc.img
>     [{ "start": 0, "length": 544210944, "depth": 0, "present": true,
> "zero": false, "data": true, "offset": 0},
>     { "start": 544210944, "length": 529530880, "depth": 0, "present":
> true, "zero": true, "data": false, "offset": 544210944}]
>
> Now half of the image is reported as data=true and half as data=false. If we
> read the entire image all of it is reported as data=true:
>
>     # dd if=falloc.img bs=1M count=1024 of=/dev/null
>
>     # qemu-img map --output json falloc.img
>     [{ "start": 0, "length": 1073741824, "depth": 0, "present": true,
> "zero": false, "data": true, "offset": 0}]
>
> If we drop caches, the image go back to the initial state (almost):
>
>     # sync; echo 1 > /proc/sys/vm/drop_caches
>
>     # qemu-img map --output json falloc.img
>     [{ "start": 0, "length": 16384, "depth": 0, "present": true,
> "zero": false, "data": true, "offset": 0},
>     { "start": 16384, "length": 1073725440, "depth": 0, "present":
> true, "zero": true, "data": false, "offset": 16384}]
>
> Based on the lseek(2) the file system can do anything, but the page
> cache is not mentioned
> as something that may affect the result of the call:
>
>    Seeking file data and holes
>        Since  Linux  3.1,  Linux  supports the following additional values for
>        whence:
>
>        SEEK_DATA
>               Adjust the file offset to the next location in the file  greater
>               than  or  equal  to offset containing data.  If offset points to
>               data, then the file offset is set to offset.
>
>        SEEK_HOLE
>               Adjust the file offset to the next hole in the file greater than
>               or equal to offset.  If offset points into the middle of a hole,
>               then the file offset is set to offset.  If there is no hole past
>               offset, then the file offset is adjusted to the end of the  file
>               (i.e., there is an implicit hole at the end of any file).
>
>        In both of the above cases, lseek() fails if offset points past the end
>        of the file.
>
>        These  operations  allow  applications to map holes in a sparsely allo‐
>        cated file.  This can be useful for applications such  as  file  backup
>        tools,  which  can save space when creating backups and preserve holes,
>        if they have a mechanism for discovering holes.
>
>        For the purposes of these operations, a hole is  a  sequence  of  zeros
>        that  (normally) has not been allocated in the underlying file storage.
>        However, a filesystem is not obliged to report holes, so  these  opera‐
>        tions  are not a guaranteed mechanism for mapping the storage space ac‐
>        tually allocated to a file.  (Furthermore, a sequence of zeros that ac‐
>        tually has been written to the underlying storage may not  be  reported
>        as  a  hole.)  In the simplest implementation, a filesystem can support
>        the operations by making SEEK_HOLE always return the offset of the  end
>        of  the  file, and making SEEK_DATA always return offset (i.e., even if
>        the location referred to by offset is a hole, it can be  considered  to
>        consist of data that is a sequence of zeros).
>
> On xfs filesystem we can inspect the actual allocation:
>
>     $ xfs_bmap -v falloc.img
>     falloc.img:
>      EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET          TOTAL
>        0: [0..7]:          192..199          0 (192..199)             8
>        1: [8..2097151]:    200..2097343      0 (200..2097343)   2097144
>
>     $ xfs_bmap -v sparse.img
>     sparse.img:
>      EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET            TOTAL
>        0: [0..7]:          2097344..2097351  0 (2097344..2097351)       8
>        1: [8..2047]:       2097352..2099391  0 (2097352..2099391)    2040
>        2: [2048..2097151]: hole                                   2095104
>
> Maybe qemu-img should use file system specific APIs like ioctl_xfs_getbmap(2)
> to get more correct and consistent allocation info?

Maybe some kernel filesystem mailing list is a better place to discuss this?


Reply via email to