28.04.2020 19:46, Vladimir Sementsov-Ogievskiy wrote:
28.04.2020 19:18, Eric Blake wrote:
On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:

Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.

I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.

Yes. Because they are doing incorrect (or at least undocumented and unreliable) 
thing.

Here's some previous mails discussing the same question about what block_status 
should actually mean.  At the time, I was so scared of the prospect of 
something breaking if I changed things that I ended up keeping status quo, so 
here we are revisiting the topic several years later, still asking the same 
questions.

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html



Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.

But read doesn't work so: it will read data from the bottom layer, not from
this implicit zeroed backing layer. And it is inconsistent. On read data
comes exactly from this layer, not from its implicit backing. So it should
return BDRV_BLOCK_ALLOCATED, accordingly to its definition..

Or, we should at least document current behavior:

   BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
   layer rather than any backing, set by block. Attention: it may not be set
   for drivers without backing support, still data is of course read from
   this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
   allocation on fs level, which occupies real space on disk.. So, for such 
drivers

   ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or
   (most probably) not,
   don't look at ALLOCATED flag, as it is added by generic layer for another 
logic,
   not related to fs-allocation.

   0 means that, most probably, data doesn't occupy space on fs, zero-status is
   unknown (most probably non-zero)


That may be right in describing the current situation, but again, needs a GOOD 
audit of what we are actually using it for, and whether it is what we really 
WANT to be using it for.  If we're going to audit/refactor the code, we might 
as well get semantics that are actually useful, rather than painfully contorted 
to documentation that happens to match our current contorted code.


Honest enough:) I'll try to make a table.

I don't think that reporting fs-allocation status is a bad thing. But I'm sure 
that it should be separated from backing-chain-allocated concept.


As a first step, I've don brief analysis of .bdrv_co_block_status of drivers 
(attached)

--
Best regards,
Vladimir
= Filter functions =
mirror,commit,backup filters: bdrv_co_block_status_from_backing
blklogwrites,compress,COR,throttle: bdrv_co_block_status_from_file

    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID
    - semantics of BDRV_BLOCK_RAW is unclean, behavior is broken

blkdebug: blkdebug_co_block_status
    - actually, uses bdrv_co_block_status_from_file,
    after additional blkdebug-related things not influincing the result

= raw =
raw: raw_co_block_status
    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID
    - semantics of BDRV_BLOCK_RAW is unclean, behavior is broken

Summary: we need to fix BDRV_BLOCK_RAW-recursion semantics to not interfere 
with block_status_above/is_allocated_above loops.

= Format drivers with supports_backing = true =

qed: bdrv_qed_co_block_status
    bdi->unallocated_blocks_are_zero = true;
    
    0 - go to backing
    ZERO - metadata-zero
    DATA | OFFSET_VALID with @map set and @file = file-child : 
metadata-allocated-data
     
parallels: parallels_co_block_status
    unallocated_blocks_are_zero is unset, but they are actually read as zero if 
no backing

    0 - go to backing
    DATA | OFFSET_VALID with @map set and @file = file-child : 
metadat-allocated-data

qcow2: qcow2_co_block_status
    bdi->unallocated_blocks_are_zero = true;

    ZERO | OFFSET_VALID with @map set and @file = something : 
metadata-allocated-zero
    DATA | OFFSET_VALID with @map set and @file = something : 
metadata-allocated-data
    RECURSE | DATA | OFFSET_VALID with @map set and @file = file-child : 
metadata-allocated-data (hint, that you'd better dig zeroes in *file)

    ZERO : metadata-zero

    DATA : compressed clusters. So, it's metadata-allocated-data, but no offset 
we can provide

    0: go to backing

qcow: qcow_co_block_status
    unallocated_blocks_are_zero is unset, but they are actually read as zero if 
no backing

    0: go to backing
    DATA: compressed
    DATA | OFFSET_VALID with @map set and @file = file-child : 
metadata-allocated-data

vmdk: vmdk_co_block_status
    unallocated_blocks_are_zero is unset, but they are actually read as zero if 
no backing

    0: go to backing
    ZERO: metadata-zero
    DATA with @file set to something: compressed
    DATA | OFFSET_VALID  with @map and @file set : metadata-allocated-data
    RECURSE | DATA | OFFSET_VALID  with @map and @file set : 
metadata-allocated-data (with hint)

Summary:
  So, obviously unallocated_blocks_are_zero is not needed, as all they behave so

  common semantics:
  just corresponds to specification:
  DATA | ZERO means backing-chain-allocated-at-this-level
  ZERO means reads as zero
  OFFSET_VALID means that it may be read from @file (which should be child of 
this) at @map offset
  additional RECURSE is hint to dig zeroes in @file of @map offset (if it is 
not set, digging is most probably useless)

  of course, nothing here about fs-allocation status or occupying space on disk

= Format drivers with supports_backing = false =

vdi: vdi_co_block_status
    bdi->unallocated_blocks_are_zero = true;

    0: metadata-unallocated.. 
       in this case vid_co_preadv zeroize qiov. So, it actually should be ZERO
    DATA | OFFSET_VALID with @map set and @file = file-child : 
metadata-allocated-data
    RECURSE | DATA | OFFSET_VALID with @map set and @file = file-child : 
metadata-allocated-data (hint, that you'd better dig zeroes in *file)

vpc: vpc_co_block_status
    bdi->unallocated_blocks_are_zero = true;
    
    0: unallocated?
       in this case vpc_co_preadv zeroize qiov. So, it actually should be ZERO
    DATA | OFFSET_VALID with @map set and @file = file-child : 
metadata-allocated-data
    RECURSE | DATA | OFFSET_VALID with @map set and @file = file-child : 
metadata-allocated-data (hint, that you'd better dig zeroes in *file)

Summary:
  Again, unallocated_blocks_are_zero is always true, so it's not actually needed
  Moreover, I think drivers should return ZERO by themselves
  So common semantics should be the same as for backing-supporting formats, 
except for they must always set at least one of DATA and ZERO flags

  of course, nothing here about fs-allocation status or occupying space on disk

== Protocol drivers, with no children ==

iscsi, iser: iscsi_co_block_status
    bdi->unallocated_blocks_are_zero = iscsilun->lbprz;
    but block-status never return 0, if iscsilun->lbprz

    always: OFFSET_VALID with @map set and @file = this

    also:
    DATA : normal-data
    0 - fs-unallocated, not zero
    ZERO - fs-unallocated, zero

nbd: nbd_client_co_block_status

    special case:

        ZERO: for offset > nbd EOF (which may be not sector-aligned)

        I think, no reason to not report OFFSET_VALID combination,
        as byte-accurate EOF is actually internal nbd driver information

    otherwise:

        always: OFFSET_VALID with @map=@offset and @file = this

        also:
        DATA: if feature unsupported, so unknown
        any combination of DATA and ZERO, mapping NBD protocol,
        !DATA means: most probably doesn't occupy space on fs
        ZERO means: reads as zero

null-co, null-aio: null_co_block_status

    Hmm, is it a protocol driver? :)

    always: OFFSET_VALID with @map set and @file = this

    also:
    ZERO - if s->read_zeroes (read will zeroize qiov)
    0 - otherwise (read does nothing)

gluster: qemu_gluster_co_block_status

    always: OFFSET_VALID with @map=@offset and @file=this

    also:
    DATA: if not want_zero, or unknown, or known-data - normal-data
    ZERO: fs-unallocated-zero

file-posix: raw_co_block_status
bdi->unallocated_blocks_are_zero = s->discard_zeroes;
but it never report unallocated :)

    always: OFFSET_VALID with @map=@offset and @file=this

    also:
    DATA: if not want_zero, or unknown, or known-data - normal-data
    ZERO: fs-unallocated-zero

sheepdog: sd_co_block_status

    Hmm. sheepdog is a dead project, yes? Should we deprecate it?

    0: fs-unallocated
    DATA | OFFSET_VALID with @map=@offset and @file=this

vvfat: vvfat_co_block_status

    always: DATA

Summary:
   First, unallocated_blocks_are_zero is obviously unused.
  
   Second, I think all should always return at least
   OFFSET_VALID with @map=@offset and @file=this, as
   it's exactly where from we read in terms of block-layer.
   And good to be consistent about it.

   Next, OK, seems, most of drivers wants to report 
fs-allocation/disk-occupying status,

   0 - most probably doesn't occupy space, not-zero
   ZERO - most probably doesn't occupy space, zero
   DATA - most probably occupy space, but also is a safe default
   DATA | ZERO - most probably occupy space, but for some reason known zeroes. 
Only nbd can theoretically do it.


= Testing =

test: bdrv_test_co_block_status

    always 0.  test_sync_op_block_status() tests different scenarios
    of bdrv_is_allocated. If we decide to change the behavior, not a
    problem to change test results and bdrv_test_co_block_status()

= Drivers without block_status =
A lot of them. Not a problem. Worth mentioning vhdx:
vhdx
    bdi->unallocated_blocks_are_zero =
        (s->params.data_bits & VHDX_PARAMS_HAS_PARENT) == 0;
    - it is not used, as DATA|ALLOCATED is set for drivers without block_status

Reply via email to