28.04.2020 19:46, Vladimir Sementsov-Ogievskiy wrote:
28.04.2020 19:18, Eric Blake wrote:
On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:
Hm. I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage). It would be a bit wrong for them to return
ZERO or DATA then.
But OTOH we don’t care about such cases, do we? We need to know whether
ranges are zero, data, or unallocated. If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.
So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.
I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.
Yes. Because they are doing incorrect (or at least undocumented and unreliable)
thing.
Here's some previous mails discussing the same question about what block_status
should actually mean. At the time, I was so scared of the prospect of
something breaking if I changed things that I ended up keeping status quo, so
here we are revisiting the topic several years later, still asking the same
questions.
https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html
Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.
But read doesn't work so: it will read data from the bottom layer, not from
this implicit zeroed backing layer. And it is inconsistent. On read data
comes exactly from this layer, not from its implicit backing. So it should
return BDRV_BLOCK_ALLOCATED, accordingly to its definition..
Or, we should at least document current behavior:
BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
layer rather than any backing, set by block. Attention: it may not be set
for drivers without backing support, still data is of course read from
this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
allocation on fs level, which occupies real space on disk.. So, for such
drivers
ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or
(most probably) not,
don't look at ALLOCATED flag, as it is added by generic layer for another
logic,
not related to fs-allocation.
0 means that, most probably, data doesn't occupy space on fs, zero-status is
unknown (most probably non-zero)
That may be right in describing the current situation, but again, needs a GOOD
audit of what we are actually using it for, and whether it is what we really
WANT to be using it for. If we're going to audit/refactor the code, we might
as well get semantics that are actually useful, rather than painfully contorted
to documentation that happens to match our current contorted code.
Honest enough:) I'll try to make a table.
I don't think that reporting fs-allocation status is a bad thing. But I'm sure
that it should be separated from backing-chain-allocated concept.
As a first step, I've don brief analysis of .bdrv_co_block_status of drivers
(attached)
--
Best regards,
Vladimir
= Filter functions =
mirror,commit,backup filters: bdrv_co_block_status_from_backing
blklogwrites,compress,COR,throttle: bdrv_co_block_status_from_file
return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID
- semantics of BDRV_BLOCK_RAW is unclean, behavior is broken
blkdebug: blkdebug_co_block_status
- actually, uses bdrv_co_block_status_from_file,
after additional blkdebug-related things not influincing the result
= raw =
raw: raw_co_block_status
return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID
- semantics of BDRV_BLOCK_RAW is unclean, behavior is broken
Summary: we need to fix BDRV_BLOCK_RAW-recursion semantics to not interfere
with block_status_above/is_allocated_above loops.
= Format drivers with supports_backing = true =
qed: bdrv_qed_co_block_status
bdi->unallocated_blocks_are_zero = true;
0 - go to backing
ZERO - metadata-zero
DATA | OFFSET_VALID with @map set and @file = file-child :
metadata-allocated-data
parallels: parallels_co_block_status
unallocated_blocks_are_zero is unset, but they are actually read as zero if
no backing
0 - go to backing
DATA | OFFSET_VALID with @map set and @file = file-child :
metadat-allocated-data
qcow2: qcow2_co_block_status
bdi->unallocated_blocks_are_zero = true;
ZERO | OFFSET_VALID with @map set and @file = something :
metadata-allocated-zero
DATA | OFFSET_VALID with @map set and @file = something :
metadata-allocated-data
RECURSE | DATA | OFFSET_VALID with @map set and @file = file-child :
metadata-allocated-data (hint, that you'd better dig zeroes in *file)
ZERO : metadata-zero
DATA : compressed clusters. So, it's metadata-allocated-data, but no offset
we can provide
0: go to backing
qcow: qcow_co_block_status
unallocated_blocks_are_zero is unset, but they are actually read as zero if
no backing
0: go to backing
DATA: compressed
DATA | OFFSET_VALID with @map set and @file = file-child :
metadata-allocated-data
vmdk: vmdk_co_block_status
unallocated_blocks_are_zero is unset, but they are actually read as zero if
no backing
0: go to backing
ZERO: metadata-zero
DATA with @file set to something: compressed
DATA | OFFSET_VALID with @map and @file set : metadata-allocated-data
RECURSE | DATA | OFFSET_VALID with @map and @file set :
metadata-allocated-data (with hint)
Summary:
So, obviously unallocated_blocks_are_zero is not needed, as all they behave so
common semantics:
just corresponds to specification:
DATA | ZERO means backing-chain-allocated-at-this-level
ZERO means reads as zero
OFFSET_VALID means that it may be read from @file (which should be child of
this) at @map offset
additional RECURSE is hint to dig zeroes in @file of @map offset (if it is
not set, digging is most probably useless)
of course, nothing here about fs-allocation status or occupying space on disk
= Format drivers with supports_backing = false =
vdi: vdi_co_block_status
bdi->unallocated_blocks_are_zero = true;
0: metadata-unallocated..
in this case vid_co_preadv zeroize qiov. So, it actually should be ZERO
DATA | OFFSET_VALID with @map set and @file = file-child :
metadata-allocated-data
RECURSE | DATA | OFFSET_VALID with @map set and @file = file-child :
metadata-allocated-data (hint, that you'd better dig zeroes in *file)
vpc: vpc_co_block_status
bdi->unallocated_blocks_are_zero = true;
0: unallocated?
in this case vpc_co_preadv zeroize qiov. So, it actually should be ZERO
DATA | OFFSET_VALID with @map set and @file = file-child :
metadata-allocated-data
RECURSE | DATA | OFFSET_VALID with @map set and @file = file-child :
metadata-allocated-data (hint, that you'd better dig zeroes in *file)
Summary:
Again, unallocated_blocks_are_zero is always true, so it's not actually needed
Moreover, I think drivers should return ZERO by themselves
So common semantics should be the same as for backing-supporting formats,
except for they must always set at least one of DATA and ZERO flags
of course, nothing here about fs-allocation status or occupying space on disk
== Protocol drivers, with no children ==
iscsi, iser: iscsi_co_block_status
bdi->unallocated_blocks_are_zero = iscsilun->lbprz;
but block-status never return 0, if iscsilun->lbprz
always: OFFSET_VALID with @map set and @file = this
also:
DATA : normal-data
0 - fs-unallocated, not zero
ZERO - fs-unallocated, zero
nbd: nbd_client_co_block_status
special case:
ZERO: for offset > nbd EOF (which may be not sector-aligned)
I think, no reason to not report OFFSET_VALID combination,
as byte-accurate EOF is actually internal nbd driver information
otherwise:
always: OFFSET_VALID with @map=@offset and @file = this
also:
DATA: if feature unsupported, so unknown
any combination of DATA and ZERO, mapping NBD protocol,
!DATA means: most probably doesn't occupy space on fs
ZERO means: reads as zero
null-co, null-aio: null_co_block_status
Hmm, is it a protocol driver? :)
always: OFFSET_VALID with @map set and @file = this
also:
ZERO - if s->read_zeroes (read will zeroize qiov)
0 - otherwise (read does nothing)
gluster: qemu_gluster_co_block_status
always: OFFSET_VALID with @map=@offset and @file=this
also:
DATA: if not want_zero, or unknown, or known-data - normal-data
ZERO: fs-unallocated-zero
file-posix: raw_co_block_status
bdi->unallocated_blocks_are_zero = s->discard_zeroes;
but it never report unallocated :)
always: OFFSET_VALID with @map=@offset and @file=this
also:
DATA: if not want_zero, or unknown, or known-data - normal-data
ZERO: fs-unallocated-zero
sheepdog: sd_co_block_status
Hmm. sheepdog is a dead project, yes? Should we deprecate it?
0: fs-unallocated
DATA | OFFSET_VALID with @map=@offset and @file=this
vvfat: vvfat_co_block_status
always: DATA
Summary:
First, unallocated_blocks_are_zero is obviously unused.
Second, I think all should always return at least
OFFSET_VALID with @map=@offset and @file=this, as
it's exactly where from we read in terms of block-layer.
And good to be consistent about it.
Next, OK, seems, most of drivers wants to report
fs-allocation/disk-occupying status,
0 - most probably doesn't occupy space, not-zero
ZERO - most probably doesn't occupy space, zero
DATA - most probably occupy space, but also is a safe default
DATA | ZERO - most probably occupy space, but for some reason known zeroes.
Only nbd can theoretically do it.
= Testing =
test: bdrv_test_co_block_status
always 0. test_sync_op_block_status() tests different scenarios
of bdrv_is_allocated. If we decide to change the behavior, not a
problem to change test results and bdrv_test_co_block_status()
= Drivers without block_status =
A lot of them. Not a problem. Worth mentioning vhdx:
vhdx
bdi->unallocated_blocks_are_zero =
(s->params.data_bits & VHDX_PARAMS_HAS_PARENT) == 0;
- it is not used, as DATA|ALLOCATED is set for drivers without block_status