Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()
Am 01.03.2018 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben: > 01.03.2018 12:48, Kevin Wolf wrote: > > Am 01.03.2018 um 08:25 hat Vladimir Sementsov-Ogievskiy geschrieben: > > > 26.02.2018 17:05, Kevin Wolf wrote: > > > > Essentially, assuming a simple backing chain 'base <- overlay', we got > > > > these combinations to represent in NBD (with my suggestion of the flags > > > > to use): > > > > > > > > 1. Cluster allocated in overlay > > > > a. non-zero data 0 > > > > b. explicit zeroes 0 or ZERO > > > > 2. Cluster marked zero in overlay HOLE | ZERO > > > > 3. Cluster preallocated/zero in overlay ZERO > > > > 4. Cluster unallocated in overlay > > > > a. Cluster allocated in base (non-zero) HOLE > > > > b. Cluster allocated in base (zero) HOLE or HOLE | ZERO > > > > c. Cluster marked zero in base HOLE | ZERO > > > > d. Cluster preallocated/zero in base HOLE | ZERO > > > > e. Cluster unallocated in base HOLE | ZERO > > > > > > > > Instead of 'base' you can read 'anywhere in the backing chain' and the > > > > flags should stay the same. > > > I think only "anywhere in the backing chain" is valid here. Otherwise, > > > semantics of bdrv_is_allocated would differ for NBD and for not-NBD. > > This was meant as a mapping from cases to flags, not the other way > > round, so really doesn't say anything about the cases where the block is > > allocated further down the chain. > > > > But yes, it shouldn't make a difference where in the backing chain a > > block is allocated, so these cases are the same as 4. > > > > > I think, if bdrv_is_allocated returns false, it means that we can skip > > > this region in copying process, am I right? > > -ENOCONTEXT? Which copying process? > > > > There are cases where you want to copy such regions, and other cases > > where you want to skip them. It depends on the use case. For example, > > 'qemu-img convert' skips them with -B (because the backing file is > > reused), but not without -B (which creates a full copy). > > > > Kevin > > Hm, I thought that bdrv_is_allocated loops through backings, but it doesn't, > sorry. That would be bdrv_is_allocated_above() with a NULL base. Kevin
Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()
01.03.2018 12:48, Kevin Wolf wrote: Am 01.03.2018 um 08:25 hat Vladimir Sementsov-Ogievskiy geschrieben: 26.02.2018 17:05, Kevin Wolf wrote: Essentially, assuming a simple backing chain 'base <- overlay', we got these combinations to represent in NBD (with my suggestion of the flags to use): 1. Cluster allocated in overlay a. non-zero data 0 b. explicit zeroes 0 or ZERO 2. Cluster marked zero in overlay HOLE | ZERO 3. Cluster preallocated/zero in overlay ZERO 4. Cluster unallocated in overlay a. Cluster allocated in base (non-zero) HOLE b. Cluster allocated in base (zero) HOLE or HOLE | ZERO c. Cluster marked zero in base HOLE | ZERO d. Cluster preallocated/zero in base HOLE | ZERO e. Cluster unallocated in base HOLE | ZERO Instead of 'base' you can read 'anywhere in the backing chain' and the flags should stay the same. I think only "anywhere in the backing chain" is valid here. Otherwise, semantics of bdrv_is_allocated would differ for NBD and for not-NBD. This was meant as a mapping from cases to flags, not the other way round, so really doesn't say anything about the cases where the block is allocated further down the chain. But yes, it shouldn't make a difference where in the backing chain a block is allocated, so these cases are the same as 4. I think, if bdrv_is_allocated returns false, it means that we can skip this region in copying process, am I right? -ENOCONTEXT? Which copying process? There are cases where you want to copy such regions, and other cases where you want to skip them. It depends on the use case. For example, 'qemu-img convert' skips them with -B (because the backing file is reused), but not without -B (which creates a full copy). Kevin Hm, I thought that bdrv_is_allocated loops through backings, but it doesn't, sorry. -- Best regards, Vladimir
Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()
Am 01.03.2018 um 08:25 hat Vladimir Sementsov-Ogievskiy geschrieben: > 26.02.2018 17:05, Kevin Wolf wrote: > > Essentially, assuming a simple backing chain 'base <- overlay', we got > > these combinations to represent in NBD (with my suggestion of the flags > > to use): > > > > 1. Cluster allocated in overlay > > a. non-zero data 0 > > b. explicit zeroes 0 or ZERO > > 2. Cluster marked zero in overlay HOLE | ZERO > > 3. Cluster preallocated/zero in overlay ZERO > > 4. Cluster unallocated in overlay > > a. Cluster allocated in base (non-zero) HOLE > > b. Cluster allocated in base (zero) HOLE or HOLE | ZERO > > c. Cluster marked zero in base HOLE | ZERO > > d. Cluster preallocated/zero in base HOLE | ZERO > > e. Cluster unallocated in base HOLE | ZERO > > > > Instead of 'base' you can read 'anywhere in the backing chain' and the > > flags should stay the same. > > I think only "anywhere in the backing chain" is valid here. Otherwise, > semantics of bdrv_is_allocated would differ for NBD and for not-NBD. This was meant as a mapping from cases to flags, not the other way round, so really doesn't say anything about the cases where the block is allocated further down the chain. But yes, it shouldn't make a difference where in the backing chain a block is allocated, so these cases are the same as 4. > I think, if bdrv_is_allocated returns false, it means that we can skip > this region in copying process, am I right? -ENOCONTEXT? Which copying process? There are cases where you want to copy such regions, and other cases where you want to skip them. It depends on the use case. For example, 'qemu-img convert' skips them with -B (because the backing file is reused), but not without -B (which creates a full copy). Kevin
Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()
26.02.2018 17:05, Kevin Wolf wrote: Am 24.02.2018 um 00:38 hat Eric Blake geschrieben: On 02/23/2018 11:05 AM, Kevin Wolf wrote: Am 23.02.2018 um 17:43 hat Eric Blake geschrieben: OFFSET_VALID | DATA might be excusable because I can see that it's convenient that a protocol driver refers to itself as *file instead of returning NULL there and then the offset is valid (though it would be pointless to actually follow the file pointer), but OFFSET_VALID without DATA probably isn't. So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but necessary to avoid breaking qemu-img map output. But you are also right that OFFSET_VALID without data makes little sense at a protocol layer. So with that in mind, I'm auditing all of the protocol layers to make sure OFFSET_VALID ends up as something sane. That's one way to look at it. The other way is that qemu-img map shouldn't ask the protocol layer for its offset because it already knows the offset (it is what it passes as a parameter to bdrv_co_block_status). Anyway, it's probably not worth changing the interface, we should just make sure that the return values of the individual drivers are consistent. Yet another inconsistency, and it's making me scratch my head today. By the way, in my byte-based stuff that is now pending on your tree, I tried hard to NOT change semantics or the set of flags returned by a given driver, and we agreed that's why you'd accept the series as-is and make me do this followup exercise. But it's looking like my followups may end up touching a lot of the same drivers again, now that I'm looking at what the semantics SHOULD be (and whatever I do end up tweaking, I will at least make sure that iotests is still happy with it). Hm, that's unfortunate, but I don't think we should hold up your first series just so we can touch the drivers only once. First, let's read what states the NBD spec is proposing: It defines the following flags for the flags field: NBD_STATE_HOLE (bit 0): if set, the block represents a hole (and future writes to that area may cause fragmentation or encounter an ENOSPC error); if clear, the block is allocated or the server could not otherwise determine its status. Note that the use of NBD_CMD_TRIM is related to this status, but that the server MAY report a hole even where NBD_CMD_TRIM has not been requested, and also that a server MAY report that the block is allocated even where NBD_CMD_TRIM has been requested. NBD_STATE_ZERO (bit 1): if set, the block contents read as all zeroes; if clear, the block contents are not known. Note that the use of NBD_CMD_WRITE_ZEROES is related to this status, but that the server MAY report zeroes even where NBD_CMD_WRITE_ZEROES has not been requested, and also that a server MAY report unknown content even where NBD_CMD_WRITE_ZEROES has been requested. It is not an error for a server to report that a region of the export has both NBD_STATE_HOLE set and NBD_STATE_ZERO clear. The contents of such an area are undefined, and a client reading such an area should make no assumption as to its contents or stability. So here's how Vladimir proposed implementing it in his series (written before my byte-based block status stuff went in to your tree): https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04038.html Server side (3/9): +int ret = bdrv_block_status_above(bs, NULL, offset, tail_bytes, , + NULL, NULL); +if (ret < 0) { +return ret; +} + +flags = (ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) | +(ret & BDRV_BLOCK_ZERO ? NBD_STATE_ZERO : 0); Client side (6/9): +*pnum = extent.length >> BDRV_SECTOR_BITS; +return (extent.flags & NBD_STATE_HOLE ? 0 : BDRV_BLOCK_DATA) | + (extent.flags & NBD_STATE_ZERO ? BDRV_BLOCK_ZERO : 0); Does anything there strike you as odd? Two things I noticed while reading the above: 1. NBD doesn't consider backing files, so the definition of holes becomes ambiguous. Is a hole any block that isn't allocated in the top layer (may cause fragmentation or encounter an ENOSPC error) or is it any block that isn't allocated anywhere in the whole backing chain (may read as non-zero)? Considering that there is a separate NBD_STATE_ZERO and nothing forbids a state of NBD_STATE_HOLE without NBD_STATE_ZERO, maybe the former is more useful. The code you quote implements the latter. Maybe if we go with the former, we should add a note to the NBD spec that explictly says that NBD_STATE_HOLE doesn't imply any specific content that is returned on reads. 2. Using BDRV_BLOCK_ALLOCATED to determine NBD_STATE_HOLE seems wrong. A (not preallocated) zero cluster in qcow2 returns BDRV_BLOCK_ALLOCATED (because we don't fall through to the backing file) even though I think it's a hole. BDRV_BLOCK_DATA should be used there (which makes it consistent with
Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()
Am 24.02.2018 um 00:38 hat Eric Blake geschrieben: > On 02/23/2018 11:05 AM, Kevin Wolf wrote: > > Am 23.02.2018 um 17:43 hat Eric Blake geschrieben: > > > > OFFSET_VALID | DATA might be excusable because I can see that it's > > > > convenient that a protocol driver refers to itself as *file instead of > > > > returning NULL there and then the offset is valid (though it would be > > > > pointless to actually follow the file pointer), but OFFSET_VALID without > > > > DATA probably isn't. > > > > > > So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but > > > necessary to avoid breaking qemu-img map output. But you are also right > > > that OFFSET_VALID without data makes little sense at a protocol layer. So > > > with that in mind, I'm auditing all of the protocol layers to make sure > > > OFFSET_VALID ends up as something sane. > > > > That's one way to look at it. > > > > The other way is that qemu-img map shouldn't ask the protocol layer for > > its offset because it already knows the offset (it is what it passes as > > a parameter to bdrv_co_block_status). > > > > Anyway, it's probably not worth changing the interface, we should just > > make sure that the return values of the individual drivers are > > consistent. > > Yet another inconsistency, and it's making me scratch my head today. > > By the way, in my byte-based stuff that is now pending on your tree, I tried > hard to NOT change semantics or the set of flags returned by a given driver, > and we agreed that's why you'd accept the series as-is and make me do this > followup exercise. But it's looking like my followups may end up touching a > lot of the same drivers again, now that I'm looking at what the semantics > SHOULD be (and whatever I do end up tweaking, I will at least make sure that > iotests is still happy with it). Hm, that's unfortunate, but I don't think we should hold up your first series just so we can touch the drivers only once. > First, let's read what states the NBD spec is proposing: > > > It defines the following flags for the flags field: > > > > NBD_STATE_HOLE (bit 0): if set, the block represents a hole (and future > > writes to that area may cause fragmentation or encounter an ENOSPC error); > > if clear, the block is allocated or the server could not otherwise > > determine its status. Note that the use of NBD_CMD_TRIM is related to this > > status, but that the server MAY report a hole even where NBD_CMD_TRIM has > > not been requested, and also that a server MAY report that the block is > > allocated even where NBD_CMD_TRIM has been requested. > > NBD_STATE_ZERO (bit 1): if set, the block contents read as all zeroes; > > if clear, the block contents are not known. Note that the use of > > NBD_CMD_WRITE_ZEROES is related to this status, but that the server MAY > > report zeroes even where NBD_CMD_WRITE_ZEROES has not been requested, and > > also that a server MAY report unknown content even where > > NBD_CMD_WRITE_ZEROES has been requested. > > > > It is not an error for a server to report that a region of the export has > > both NBD_STATE_HOLE set and NBD_STATE_ZERO clear. The contents of such an > > area are undefined, and a client reading such an area should make no > > assumption as to its contents or stability. > > So here's how Vladimir proposed implementing it in his series (written > before my byte-based block status stuff went in to your tree): > https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04038.html > > Server side (3/9): > > +int ret = bdrv_block_status_above(bs, NULL, offset, tail_bytes, > , > + NULL, NULL); > +if (ret < 0) { > +return ret; > +} > + > +flags = (ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) | > +(ret & BDRV_BLOCK_ZERO ? NBD_STATE_ZERO : 0); > > Client side (6/9): > > +*pnum = extent.length >> BDRV_SECTOR_BITS; > +return (extent.flags & NBD_STATE_HOLE ? 0 : BDRV_BLOCK_DATA) | > + (extent.flags & NBD_STATE_ZERO ? BDRV_BLOCK_ZERO : 0); > > Does anything there strike you as odd? Two things I noticed while reading the above: 1. NBD doesn't consider backing files, so the definition of holes becomes ambiguous. Is a hole any block that isn't allocated in the top layer (may cause fragmentation or encounter an ENOSPC error) or is it any block that isn't allocated anywhere in the whole backing chain (may read as non-zero)? Considering that there is a separate NBD_STATE_ZERO and nothing forbids a state of NBD_STATE_HOLE without NBD_STATE_ZERO, maybe the former is more useful. The code you quote implements the latter. Maybe if we go with the former, we should add a note to the NBD spec that explictly says that NBD_STATE_HOLE doesn't imply any specific content that is returned on reads. 2. Using BDRV_BLOCK_ALLOCATED to determine NBD_STATE_HOLE seems wrong. A (not
Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()
On 02/23/2018 11:05 AM, Kevin Wolf wrote: Am 23.02.2018 um 17:43 hat Eric Blake geschrieben: OFFSET_VALID | DATA might be excusable because I can see that it's convenient that a protocol driver refers to itself as *file instead of returning NULL there and then the offset is valid (though it would be pointless to actually follow the file pointer), but OFFSET_VALID without DATA probably isn't. So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but necessary to avoid breaking qemu-img map output. But you are also right that OFFSET_VALID without data makes little sense at a protocol layer. So with that in mind, I'm auditing all of the protocol layers to make sure OFFSET_VALID ends up as something sane. That's one way to look at it. The other way is that qemu-img map shouldn't ask the protocol layer for its offset because it already knows the offset (it is what it passes as a parameter to bdrv_co_block_status). Anyway, it's probably not worth changing the interface, we should just make sure that the return values of the individual drivers are consistent. Yet another inconsistency, and it's making me scratch my head today. By the way, in my byte-based stuff that is now pending on your tree, I tried hard to NOT change semantics or the set of flags returned by a given driver, and we agreed that's why you'd accept the series as-is and make me do this followup exercise. But it's looking like my followups may end up touching a lot of the same drivers again, now that I'm looking at what the semantics SHOULD be (and whatever I do end up tweaking, I will at least make sure that iotests is still happy with it). First, let's read what states the NBD spec is proposing: It defines the following flags for the flags field: NBD_STATE_HOLE (bit 0): if set, the block represents a hole (and future writes to that area may cause fragmentation or encounter an ENOSPC error); if clear, the block is allocated or the server could not otherwise determine its status. Note that the use of NBD_CMD_TRIM is related to this status, but that the server MAY report a hole even where NBD_CMD_TRIM has not been requested, and also that a server MAY report that the block is allocated even where NBD_CMD_TRIM has been requested. NBD_STATE_ZERO (bit 1): if set, the block contents read as all zeroes; if clear, the block contents are not known. Note that the use of NBD_CMD_WRITE_ZEROES is related to this status, but that the server MAY report zeroes even where NBD_CMD_WRITE_ZEROES has not been requested, and also that a server MAY report unknown content even where NBD_CMD_WRITE_ZEROES has been requested. It is not an error for a server to report that a region of the export has both NBD_STATE_HOLE set and NBD_STATE_ZERO clear. The contents of such an area are undefined, and a client reading such an area should make no assumption as to its contents or stability. So here's how Vladimir proposed implementing it in his series (written before my byte-based block status stuff went in to your tree): https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04038.html Server side (3/9): +int ret = bdrv_block_status_above(bs, NULL, offset, tail_bytes, , + NULL, NULL); +if (ret < 0) { +return ret; +} + +flags = (ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) | +(ret & BDRV_BLOCK_ZERO ? NBD_STATE_ZERO : 0); Client side (6/9): +*pnum = extent.length >> BDRV_SECTOR_BITS; +return (extent.flags & NBD_STATE_HOLE ? 0 : BDRV_BLOCK_DATA) | + (extent.flags & NBD_STATE_ZERO ? BDRV_BLOCK_ZERO : 0); Does anything there strike you as odd? In isolation, they seemed fine to me, but side-by-side, I'm scratching my head: the server queries the block layer, and turns BDRV_BLOCK_ALLOCATED into !NBD_STATE_HOLE; the client side then takes the NBD protocol and tries to turn it back into information to feed the block layer, where !NBD_STATE_HOLE now feeds BDRV_BLOCK_DATA. Why the different choice of bits? Part of the story is that right now, we document that ONLY the block layer sets _ALLOCATED, in io.c, as a result of the driver layer returning HOLE || ZERO (there are cases where the block layer can return ZERO but not ALLOCATED, because the driver layer returned 0 but the block layer still knows that area reads as zero). So Victor's patch matches the fact that the driver shouldn't set ALLOCATED. Still, if we are tying ALLOCATED to whether there is a hole, then that seems like information we should be getting from the driver, not something synthesized after we've left the driver! Then there's the question of file-posix.c: what should it return for a hole, ZERO|OFFSET_VALID or DATA|ZERO|OFFSET_VALID? The wording in block.h implies that if DATA is not set, then the area reads as zero to the guest, but may have indeterminate value on the underlying file - but we KNOW
Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()
Am 23.02.2018 um 17:43 hat Eric Blake geschrieben: > > OFFSET_VALID | DATA might be excusable because I can see that it's > > convenient that a protocol driver refers to itself as *file instead of > > returning NULL there and then the offset is valid (though it would be > > pointless to actually follow the file pointer), but OFFSET_VALID without > > DATA probably isn't. > > So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but > necessary to avoid breaking qemu-img map output. But you are also right > that OFFSET_VALID without data makes little sense at a protocol layer. So > with that in mind, I'm auditing all of the protocol layers to make sure > OFFSET_VALID ends up as something sane. That's one way to look at it. The other way is that qemu-img map shouldn't ask the protocol layer for its offset because it already knows the offset (it is what it passes as a parameter to bdrv_co_block_status). Anyway, it's probably not worth changing the interface, we should just make sure that the return values of the individual drivers are consistent. Kevin
Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()
On 02/14/2018 06:05 AM, Kevin Wolf wrote: +static int coroutine_fn null_co_block_status(BlockDriverState *bs, if (s->read_zeroes) { -return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO; -} else { -return BDRV_BLOCK_OFFSET_VALID | start; +ret |= BDRV_BLOCK_ZERO; } +return ret; } Preexisting, but I think this return value is wrong. OFFSET_VALID without DATA is to documented to have the following semantics: * DATA ZERO OFFSET_VALID * ftt sectors preallocated, read as zero, returned file not *necessarily zero at offset * fft sectors preallocated but read from backing_hd, *returned file contains garbage at offset I'm not sure what OFFSET_VALID is even supposed to mean for null. I'm finally getting around to playing with this. Or in fact, what it is supposed to mean for any protocol driver, because normally it just means I can use this offset for accessing bs->file. But > protocol drivers don't have a bs->file, so it's interesting to see that they still all set this flag. More precisely, it means "I can use this offset for accessing the returned *file". Format and filter drivers set *file = bs->file (ie. their protocol layer), but protocol drivers set *file = bs (ie. themselves). As long as you read it as "the offset is valid in the returned *file", and are careful as to _which_ BDS gets returned in *file*, it can still make sense. So next I tried playing with a patch, to see how much returning OFFSET_VALID with DATA matters; and it turns out is is easily observable anywhere that the underlying protocol bleeds through to the format layer (particularly the raw format driver): $ echo abc > tmp $ truncate --size=10M tmp pre-patch: $ ./qemu-img map --output=json tmp [{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 0}, { "start": 4096, "length": 10481664, "depth": 0, "zero": true, "data": false, "offset": 4096}] turn off OFFSET_VALID at the protocol layer: diff --git i/block/file-posix.c w/block/file-posix.c index f1591c38490..c05992c1121 100644 --- i/block/file-posix.c +++ w/block/file-posix.c @@ -2158,9 +2158,7 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs, if (!want_zero) { *pnum = bytes; -*map = offset; -*file = bs; -return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID; +return BDRV_BLOCK_DATA; } ret = find_allocation(bs, offset, , ); @@ -2183,9 +2181,7 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs, *pnum = MIN(bytes, data - offset); ret = BDRV_BLOCK_ZERO; } -*map = offset; -*file = bs; -return ret | BDRV_BLOCK_OFFSET_VALID; +return ret; } static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs, post-patch: $ ./qemu-img map --output=json tmp [{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true}, { "start": 4096, "length": 10481664, "depth": 0, "zero": true, "data": false}] OFFSET_VALID | DATA might be excusable because I can see that it's convenient that a protocol driver refers to itself as *file instead of returning NULL there and then the offset is valid (though it would be pointless to actually follow the file pointer), but OFFSET_VALID without DATA probably isn't. So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but necessary to avoid breaking qemu-img map output. But you are also right that OFFSET_VALID without data makes little sense at a protocol layer. So with that in mind, I'm auditing all of the protocol layers to make sure OFFSET_VALID ends up as something sane. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()
Am 14.02.2018 um 15:44 hat Eric Blake geschrieben: > On 02/14/2018 06:05 AM, Kevin Wolf wrote: > > Am 13.02.2018 um 21:26 hat Eric Blake geschrieben: > > > We are gradually moving away from sector-based interfaces, towards > > > byte-based. Update the null driver accordingly. > > > > > > Signed-off-by: Eric Blake> > > Reviewed-by: Vladimir Sementsov-Ogievskiy > > > Reviewed-by: Fam Zheng > > > > > > > if (s->read_zeroes) { > > > -return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO; > > > -} else { > > > -return BDRV_BLOCK_OFFSET_VALID | start; > > > +ret |= BDRV_BLOCK_ZERO; > > > } > > > +return ret; > > > } > > > > Preexisting, but I think this return value is wrong. OFFSET_VALID > > without DATA is to documented to have the following semantics: > > > > * DATA ZERO OFFSET_VALID > > * ftt sectors preallocated, read as zero, returned > > file not > > *necessarily zero at offset > > * fft sectors preallocated but read from backing_hd, > > *returned file contains garbage at offset > > > > I'm not sure what OFFSET_VALID is even supposed to mean for null. > > Yeah, and I was even thinking about that a bit yesterday when figuring out > what to do with nvme. It does highlight the fact that you get garbage when > reading from the null driver (unless the zero option was enabled, then ZERO > is set and you know you read zeros instead) - but there no pointer that is > preallocated (whether it contains garbage or otherwise) that you can > actually dereference to read what the guest would see. > > > > > Or in fact, what it is supposed to mean for any protocol driver, because > > normally it just means I can use this offset for accessing bs->file. But > > protocol drivers don't have a bs->file, so it's interesting to see that > > they still all set this flag. > > > > OFFSET_VALID | DATA might be excusable because I can see that it's > > convenient that a protocol driver refers to itself as *file instead of > > returning NULL there and then the offset is valid (though it would be > > pointless to actually follow the file pointer), but OFFSET_VALID without > > DATA probably isn't. > > Hmm, you're probably right. Maybe that means I should tweak the > documentation to be more explicit: for a format driver, OFFSET_VALID can > always be used (and *file will be set to the underlying protocol driver); > but for a protocol driver, OFFSET_VALID only makes sense if *file is the BDS > itself and there is an actual buffer to read (that is, the protocol driver > must also be returning DATA and/or ZERO). Or maybe we can indeed state that > protocol drivers always set *file to NULL (there is no further backing file > to reference), and thus never need to return OFFSET_VALID (but I'm not sure > whether that will accidentally propagate back up the call stack and > negatively affect status queries of format drivers). > > Since it is pre-existing, should I respin to address the issue in a separate > patch, or should that be a followup after this series? It's a more fundamental question that shouldn't hold up this series. I just wanted to raise it while I was looking at it. So yes, a followup is fine. Kevin
Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()
On 02/14/2018 06:05 AM, Kevin Wolf wrote: Am 13.02.2018 um 21:26 hat Eric Blake geschrieben: We are gradually moving away from sector-based interfaces, towards byte-based. Update the null driver accordingly. Signed-off-by: Eric BlakeReviewed-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fam Zheng if (s->read_zeroes) { -return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO; -} else { -return BDRV_BLOCK_OFFSET_VALID | start; +ret |= BDRV_BLOCK_ZERO; } +return ret; } Preexisting, but I think this return value is wrong. OFFSET_VALID without DATA is to documented to have the following semantics: * DATA ZERO OFFSET_VALID * ftt sectors preallocated, read as zero, returned file not *necessarily zero at offset * fft sectors preallocated but read from backing_hd, *returned file contains garbage at offset I'm not sure what OFFSET_VALID is even supposed to mean for null. Yeah, and I was even thinking about that a bit yesterday when figuring out what to do with nvme. It does highlight the fact that you get garbage when reading from the null driver (unless the zero option was enabled, then ZERO is set and you know you read zeros instead) - but there no pointer that is preallocated (whether it contains garbage or otherwise) that you can actually dereference to read what the guest would see. Or in fact, what it is supposed to mean for any protocol driver, because normally it just means I can use this offset for accessing bs->file. But protocol drivers don't have a bs->file, so it's interesting to see that they still all set this flag. OFFSET_VALID | DATA might be excusable because I can see that it's convenient that a protocol driver refers to itself as *file instead of returning NULL there and then the offset is valid (though it would be pointless to actually follow the file pointer), but OFFSET_VALID without DATA probably isn't. Hmm, you're probably right. Maybe that means I should tweak the documentation to be more explicit: for a format driver, OFFSET_VALID can always be used (and *file will be set to the underlying protocol driver); but for a protocol driver, OFFSET_VALID only makes sense if *file is the BDS itself and there is an actual buffer to read (that is, the protocol driver must also be returning DATA and/or ZERO). Or maybe we can indeed state that protocol drivers always set *file to NULL (there is no further backing file to reference), and thus never need to return OFFSET_VALID (but I'm not sure whether that will accidentally propagate back up the call stack and negatively affect status queries of format drivers). Since it is pre-existing, should I respin to address the issue in a separate patch, or should that be a followup after this series? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
[Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()
We are gradually moving away from sector-based interfaces, towards byte-based. Update the null driver accordingly. Signed-off-by: Eric BlakeReviewed-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fam Zheng --- v6-v7: no change v5: minor fix to type of 'ret' v4: rebase to interface tweak v3: no change v2: rebase to mapping parameter --- block/null.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/block/null.c b/block/null.c index 214d394fff4..806a8631e4d 100644 --- a/block/null.c +++ b/block/null.c @@ -223,22 +223,23 @@ static int null_reopen_prepare(BDRVReopenState *reopen_state, return 0; } -static int64_t coroutine_fn null_co_get_block_status(BlockDriverState *bs, - int64_t sector_num, - int nb_sectors, int *pnum, - BlockDriverState **file) +static int coroutine_fn null_co_block_status(BlockDriverState *bs, + bool want_zero, int64_t offset, + int64_t bytes, int64_t *pnum, + int64_t *map, + BlockDriverState **file) { BDRVNullState *s = bs->opaque; -off_t start = sector_num * BDRV_SECTOR_SIZE; +int ret = BDRV_BLOCK_OFFSET_VALID; -*pnum = nb_sectors; +*pnum = bytes; +*map = offset; *file = bs; if (s->read_zeroes) { -return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO; -} else { -return BDRV_BLOCK_OFFSET_VALID | start; +ret |= BDRV_BLOCK_ZERO; } +return ret; } static void null_refresh_filename(BlockDriverState *bs, QDict *opts) @@ -270,7 +271,7 @@ static BlockDriver bdrv_null_co = { .bdrv_co_flush_to_disk = null_co_flush, .bdrv_reopen_prepare= null_reopen_prepare, -.bdrv_co_get_block_status = null_co_get_block_status, +.bdrv_co_block_status = null_co_block_status, .bdrv_refresh_filename = null_refresh_filename, }; @@ -290,7 +291,7 @@ static BlockDriver bdrv_null_aio = { .bdrv_aio_flush = null_aio_flush, .bdrv_reopen_prepare= null_reopen_prepare, -.bdrv_co_get_block_status = null_co_get_block_status, +.bdrv_co_block_status = null_co_block_status, .bdrv_refresh_filename = null_refresh_filename, }; -- 2.14.3