Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-03-01 Thread Kevin Wolf
Am 01.03.2018 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 01.03.2018 12:48, Kevin Wolf wrote:
> > Am 01.03.2018 um 08:25 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 26.02.2018 17:05, Kevin Wolf wrote:
> > > > Essentially, assuming a simple backing chain 'base <- overlay', we got
> > > > these combinations to represent in NBD (with my suggestion of the flags
> > > > to use):
> > > > 
> > > > 1. Cluster allocated in overlay
> > > >  a. non-zero data 0
> > > >  b. explicit zeroes   0 or ZERO
> > > > 2. Cluster marked zero in overlay   HOLE | ZERO
> > > > 3. Cluster preallocated/zero in overlay ZERO
> > > > 4. Cluster unallocated in overlay
> > > >  a. Cluster allocated in base (non-zero)  HOLE
> > > >  b. Cluster allocated in base (zero)  HOLE or HOLE | ZERO
> > > >  c. Cluster marked zero in base   HOLE | ZERO
> > > >  d. Cluster preallocated/zero in base HOLE | ZERO
> > > >  e. Cluster unallocated in base   HOLE | ZERO
> > > > 
> > > > Instead of 'base' you can read 'anywhere in the backing chain' and the
> > > > flags should stay the same.
> > > I think only "anywhere in the backing chain" is valid here. Otherwise,
> > > semantics of bdrv_is_allocated would differ for NBD and for not-NBD.
> > This was meant as a mapping from cases to flags, not the other way
> > round, so really doesn't say anything about the cases where the block is
> > allocated further down the chain.
> > 
> > But yes, it shouldn't make a difference where in the backing chain a
> > block is allocated, so these cases are the same as 4.
> > 
> > > I think, if bdrv_is_allocated returns false, it means that we can skip
> > > this region in copying process, am I right?
> > -ENOCONTEXT? Which copying process?
> > 
> > There are cases where you want to copy such regions, and other cases
> > where you want to skip them. It depends on the use case. For example,
> > 'qemu-img convert' skips them with -B (because the backing file is
> > reused), but not without -B (which creates a full copy).
> > 
> > Kevin
> 
> Hm, I thought that bdrv_is_allocated loops through backings, but it doesn't,
> sorry.

That would be bdrv_is_allocated_above() with a NULL base.

Kevin



Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-03-01 Thread Vladimir Sementsov-Ogievskiy

01.03.2018 12:48, Kevin Wolf wrote:

Am 01.03.2018 um 08:25 hat Vladimir Sementsov-Ogievskiy geschrieben:

26.02.2018 17:05, Kevin Wolf wrote:

Essentially, assuming a simple backing chain 'base <- overlay', we got
these combinations to represent in NBD (with my suggestion of the flags
to use):

1. Cluster allocated in overlay
 a. non-zero data 0
 b. explicit zeroes   0 or ZERO
2. Cluster marked zero in overlay   HOLE | ZERO
3. Cluster preallocated/zero in overlay ZERO
4. Cluster unallocated in overlay
 a. Cluster allocated in base (non-zero)  HOLE
 b. Cluster allocated in base (zero)  HOLE or HOLE | ZERO
 c. Cluster marked zero in base   HOLE | ZERO
 d. Cluster preallocated/zero in base HOLE | ZERO
 e. Cluster unallocated in base   HOLE | ZERO

Instead of 'base' you can read 'anywhere in the backing chain' and the
flags should stay the same.

I think only "anywhere in the backing chain" is valid here. Otherwise,
semantics of bdrv_is_allocated would differ for NBD and for not-NBD.

This was meant as a mapping from cases to flags, not the other way
round, so really doesn't say anything about the cases where the block is
allocated further down the chain.

But yes, it shouldn't make a difference where in the backing chain a
block is allocated, so these cases are the same as 4.


I think, if bdrv_is_allocated returns false, it means that we can skip
this region in copying process, am I right?

-ENOCONTEXT? Which copying process?

There are cases where you want to copy such regions, and other cases
where you want to skip them. It depends on the use case. For example,
'qemu-img convert' skips them with -B (because the backing file is
reused), but not without -B (which creates a full copy).

Kevin


Hm, I thought that bdrv_is_allocated loops through backings, but it 
doesn't, sorry.


--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-03-01 Thread Kevin Wolf
Am 01.03.2018 um 08:25 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 26.02.2018 17:05, Kevin Wolf wrote:
> > Essentially, assuming a simple backing chain 'base <- overlay', we got
> > these combinations to represent in NBD (with my suggestion of the flags
> > to use):
> > 
> > 1. Cluster allocated in overlay
> > a. non-zero data 0
> > b. explicit zeroes   0 or ZERO
> > 2. Cluster marked zero in overlay   HOLE | ZERO
> > 3. Cluster preallocated/zero in overlay ZERO
> > 4. Cluster unallocated in overlay
> > a. Cluster allocated in base (non-zero)  HOLE
> > b. Cluster allocated in base (zero)  HOLE or HOLE | ZERO
> > c. Cluster marked zero in base   HOLE | ZERO
> > d. Cluster preallocated/zero in base HOLE | ZERO
> > e. Cluster unallocated in base   HOLE | ZERO
> > 
> > Instead of 'base' you can read 'anywhere in the backing chain' and the
> > flags should stay the same.
> 
> I think only "anywhere in the backing chain" is valid here. Otherwise,
> semantics of bdrv_is_allocated would differ for NBD and for not-NBD.

This was meant as a mapping from cases to flags, not the other way
round, so really doesn't say anything about the cases where the block is
allocated further down the chain.

But yes, it shouldn't make a difference where in the backing chain a
block is allocated, so these cases are the same as 4.

> I think, if bdrv_is_allocated returns false, it means that we can skip
> this region in copying process, am I right?

-ENOCONTEXT? Which copying process?

There are cases where you want to copy such regions, and other cases
where you want to skip them. It depends on the use case. For example,
'qemu-img convert' skips them with -B (because the backing file is
reused), but not without -B (which creates a full copy).

Kevin



Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-28 Thread Vladimir Sementsov-Ogievskiy

26.02.2018 17:05, Kevin Wolf wrote:

Am 24.02.2018 um 00:38 hat Eric Blake geschrieben:

On 02/23/2018 11:05 AM, Kevin Wolf wrote:

Am 23.02.2018 um 17:43 hat Eric Blake geschrieben:

OFFSET_VALID | DATA might be excusable because I can see that it's
convenient that a protocol driver refers to itself as *file instead of
returning NULL there and then the offset is valid (though it would be
pointless to actually follow the file pointer), but OFFSET_VALID without
DATA probably isn't.

So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but
necessary to avoid breaking qemu-img map output.  But you are also right
that OFFSET_VALID without data makes little sense at a protocol layer. So
with that in mind, I'm auditing all of the protocol layers to make sure
OFFSET_VALID ends up as something sane.

That's one way to look at it.

The other way is that qemu-img map shouldn't ask the protocol layer for
its offset because it already knows the offset (it is what it passes as
a parameter to bdrv_co_block_status).

Anyway, it's probably not worth changing the interface, we should just
make sure that the return values of the individual drivers are
consistent.

Yet another inconsistency, and it's making me scratch my head today.

By the way, in my byte-based stuff that is now pending on your tree, I tried
hard to NOT change semantics or the set of flags returned by a given driver,
and we agreed that's why you'd accept the series as-is and make me do this
followup exercise.  But it's looking like my followups may end up touching a
lot of the same drivers again, now that I'm looking at what the semantics
SHOULD be (and whatever I do end up tweaking, I will at least make sure that
iotests is still happy with it).

Hm, that's unfortunate, but I don't think we should hold up your first
series just so we can touch the drivers only once.


First, let's read what states the NBD spec is proposing:


It defines the following flags for the flags field:

 NBD_STATE_HOLE (bit 0): if set, the block represents a hole (and future 
writes to that area may cause fragmentation or encounter an ENOSPC error); if 
clear, the block is allocated or the server could not otherwise determine its 
status. Note that the use of NBD_CMD_TRIM is related to this status, but that 
the server MAY report a hole even where NBD_CMD_TRIM has not been requested, 
and also that a server MAY report that the block is allocated even where 
NBD_CMD_TRIM has been requested.
 NBD_STATE_ZERO (bit 1): if set, the block contents read as all zeroes; if 
clear, the block contents are not known. Note that the use of 
NBD_CMD_WRITE_ZEROES is related to this status, but that the server MAY report 
zeroes even where NBD_CMD_WRITE_ZEROES has not been requested, and also that a 
server MAY report unknown content even where NBD_CMD_WRITE_ZEROES has been 
requested.

It is not an error for a server to report that a region of the export has both 
NBD_STATE_HOLE set and NBD_STATE_ZERO clear. The contents of such an area are 
undefined, and a client reading such an area should make no assumption as to 
its contents or stability.

So here's how Vladimir proposed implementing it in his series (written
before my byte-based block status stuff went in to your tree):
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04038.html

Server side (3/9):

+int ret = bdrv_block_status_above(bs, NULL, offset, tail_bytes,
,
+  NULL, NULL);
+if (ret < 0) {
+return ret;
+}
+
+flags = (ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) |
+(ret & BDRV_BLOCK_ZERO  ? NBD_STATE_ZERO : 0);

Client side (6/9):

+*pnum = extent.length >> BDRV_SECTOR_BITS;
+return (extent.flags & NBD_STATE_HOLE ? 0 : BDRV_BLOCK_DATA) |
+   (extent.flags & NBD_STATE_ZERO ? BDRV_BLOCK_ZERO : 0);

Does anything there strike you as odd?

Two things I noticed while reading the above:

1. NBD doesn't consider backing files, so the definition of holes
becomes ambiguous. Is a hole any block that isn't allocated in the
top layer (may cause fragmentation or encounter an ENOSPC error) or
is it any block that isn't allocated anywhere in the whole backing
chain (may read as non-zero)?

Considering that there is a separate NBD_STATE_ZERO and nothing
forbids a state of NBD_STATE_HOLE without NBD_STATE_ZERO, maybe the
former is more useful. The code you quote implements the latter.

Maybe if we go with the former, we should add a note to the NBD spec
that explictly says that NBD_STATE_HOLE doesn't imply any specific
content that is returned on reads.

2. Using BDRV_BLOCK_ALLOCATED to determine NBD_STATE_HOLE seems wrong. A
(not preallocated) zero cluster in qcow2 returns BDRV_BLOCK_ALLOCATED
(because we don't fall through to the backing file) even though I
think it's a hole. BDRV_BLOCK_DATA should be used there (which makes
it consistent with 

Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-26 Thread Kevin Wolf
Am 24.02.2018 um 00:38 hat Eric Blake geschrieben:
> On 02/23/2018 11:05 AM, Kevin Wolf wrote:
> > Am 23.02.2018 um 17:43 hat Eric Blake geschrieben:
> > > > OFFSET_VALID | DATA might be excusable because I can see that it's
> > > > convenient that a protocol driver refers to itself as *file instead of
> > > > returning NULL there and then the offset is valid (though it would be
> > > > pointless to actually follow the file pointer), but OFFSET_VALID without
> > > > DATA probably isn't.
> > > 
> > > So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but
> > > necessary to avoid breaking qemu-img map output.  But you are also right
> > > that OFFSET_VALID without data makes little sense at a protocol layer. So
> > > with that in mind, I'm auditing all of the protocol layers to make sure
> > > OFFSET_VALID ends up as something sane.
> > 
> > That's one way to look at it.
> > 
> > The other way is that qemu-img map shouldn't ask the protocol layer for
> > its offset because it already knows the offset (it is what it passes as
> > a parameter to bdrv_co_block_status).
> > 
> > Anyway, it's probably not worth changing the interface, we should just
> > make sure that the return values of the individual drivers are
> > consistent.
> 
> Yet another inconsistency, and it's making me scratch my head today.
> 
> By the way, in my byte-based stuff that is now pending on your tree, I tried
> hard to NOT change semantics or the set of flags returned by a given driver,
> and we agreed that's why you'd accept the series as-is and make me do this
> followup exercise.  But it's looking like my followups may end up touching a
> lot of the same drivers again, now that I'm looking at what the semantics
> SHOULD be (and whatever I do end up tweaking, I will at least make sure that
> iotests is still happy with it).

Hm, that's unfortunate, but I don't think we should hold up your first
series just so we can touch the drivers only once.

> First, let's read what states the NBD spec is proposing:
> 
> > It defines the following flags for the flags field:
> > 
> > NBD_STATE_HOLE (bit 0): if set, the block represents a hole (and future 
> > writes to that area may cause fragmentation or encounter an ENOSPC error); 
> > if clear, the block is allocated or the server could not otherwise 
> > determine its status. Note that the use of NBD_CMD_TRIM is related to this 
> > status, but that the server MAY report a hole even where NBD_CMD_TRIM has 
> > not been requested, and also that a server MAY report that the block is 
> > allocated even where NBD_CMD_TRIM has been requested.
> > NBD_STATE_ZERO (bit 1): if set, the block contents read as all zeroes; 
> > if clear, the block contents are not known. Note that the use of 
> > NBD_CMD_WRITE_ZEROES is related to this status, but that the server MAY 
> > report zeroes even where NBD_CMD_WRITE_ZEROES has not been requested, and 
> > also that a server MAY report unknown content even where 
> > NBD_CMD_WRITE_ZEROES has been requested.
> > 
> > It is not an error for a server to report that a region of the export has 
> > both NBD_STATE_HOLE set and NBD_STATE_ZERO clear. The contents of such an 
> > area are undefined, and a client reading such an area should make no 
> > assumption as to its contents or stability.
> 
> So here's how Vladimir proposed implementing it in his series (written
> before my byte-based block status stuff went in to your tree):
> https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04038.html
> 
> Server side (3/9):
> 
> +int ret = bdrv_block_status_above(bs, NULL, offset, tail_bytes,
> ,
> +  NULL, NULL);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +flags = (ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) |
> +(ret & BDRV_BLOCK_ZERO  ? NBD_STATE_ZERO : 0);
> 
> Client side (6/9):
> 
> +*pnum = extent.length >> BDRV_SECTOR_BITS;
> +return (extent.flags & NBD_STATE_HOLE ? 0 : BDRV_BLOCK_DATA) |
> +   (extent.flags & NBD_STATE_ZERO ? BDRV_BLOCK_ZERO : 0);
> 
> Does anything there strike you as odd?

Two things I noticed while reading the above:

1. NBD doesn't consider backing files, so the definition of holes
   becomes ambiguous. Is a hole any block that isn't allocated in the
   top layer (may cause fragmentation or encounter an ENOSPC error) or
   is it any block that isn't allocated anywhere in the whole backing
   chain (may read as non-zero)?

   Considering that there is a separate NBD_STATE_ZERO and nothing
   forbids a state of NBD_STATE_HOLE without NBD_STATE_ZERO, maybe the
   former is more useful. The code you quote implements the latter.

   Maybe if we go with the former, we should add a note to the NBD spec
   that explictly says that NBD_STATE_HOLE doesn't imply any specific
   content that is returned on reads.

2. Using BDRV_BLOCK_ALLOCATED to determine NBD_STATE_HOLE seems wrong. A
   (not 

Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-23 Thread Eric Blake

On 02/23/2018 11:05 AM, Kevin Wolf wrote:

Am 23.02.2018 um 17:43 hat Eric Blake geschrieben:

OFFSET_VALID | DATA might be excusable because I can see that it's
convenient that a protocol driver refers to itself as *file instead of
returning NULL there and then the offset is valid (though it would be
pointless to actually follow the file pointer), but OFFSET_VALID without
DATA probably isn't.


So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but
necessary to avoid breaking qemu-img map output.  But you are also right
that OFFSET_VALID without data makes little sense at a protocol layer. So
with that in mind, I'm auditing all of the protocol layers to make sure
OFFSET_VALID ends up as something sane.


That's one way to look at it.

The other way is that qemu-img map shouldn't ask the protocol layer for
its offset because it already knows the offset (it is what it passes as
a parameter to bdrv_co_block_status).

Anyway, it's probably not worth changing the interface, we should just
make sure that the return values of the individual drivers are
consistent.


Yet another inconsistency, and it's making me scratch my head today.

By the way, in my byte-based stuff that is now pending on your tree, I 
tried hard to NOT change semantics or the set of flags returned by a 
given driver, and we agreed that's why you'd accept the series as-is and 
make me do this followup exercise.  But it's looking like my followups 
may end up touching a lot of the same drivers again, now that I'm 
looking at what the semantics SHOULD be (and whatever I do end up 
tweaking, I will at least make sure that iotests is still happy with it).


First, let's read what states the NBD spec is proposing:


It defines the following flags for the flags field:

NBD_STATE_HOLE (bit 0): if set, the block represents a hole (and future 
writes to that area may cause fragmentation or encounter an ENOSPC error); if 
clear, the block is allocated or the server could not otherwise determine its 
status. Note that the use of NBD_CMD_TRIM is related to this status, but that 
the server MAY report a hole even where NBD_CMD_TRIM has not been requested, 
and also that a server MAY report that the block is allocated even where 
NBD_CMD_TRIM has been requested.
NBD_STATE_ZERO (bit 1): if set, the block contents read as all zeroes; if 
clear, the block contents are not known. Note that the use of 
NBD_CMD_WRITE_ZEROES is related to this status, but that the server MAY report 
zeroes even where NBD_CMD_WRITE_ZEROES has not been requested, and also that a 
server MAY report unknown content even where NBD_CMD_WRITE_ZEROES has been 
requested.

It is not an error for a server to report that a region of the export has both 
NBD_STATE_HOLE set and NBD_STATE_ZERO clear. The contents of such an area are 
undefined, and a client reading such an area should make no assumption as to 
its contents or stability.


So here's how Vladimir proposed implementing it in his series (written 
before my byte-based block status stuff went in to your tree):

https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04038.html

Server side (3/9):

+int ret = bdrv_block_status_above(bs, NULL, offset, tail_bytes, 
,

+  NULL, NULL);
+if (ret < 0) {
+return ret;
+}
+
+flags = (ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) |
+(ret & BDRV_BLOCK_ZERO  ? NBD_STATE_ZERO : 0);

Client side (6/9):

+*pnum = extent.length >> BDRV_SECTOR_BITS;
+return (extent.flags & NBD_STATE_HOLE ? 0 : BDRV_BLOCK_DATA) |
+   (extent.flags & NBD_STATE_ZERO ? BDRV_BLOCK_ZERO : 0);

Does anything there strike you as odd?  In isolation, they seemed fine 
to me, but side-by-side, I'm scratching my head: the server queries the 
block layer, and turns BDRV_BLOCK_ALLOCATED into !NBD_STATE_HOLE; the 
client side then takes the NBD protocol and tries to turn it back into 
information to feed the block layer, where !NBD_STATE_HOLE now feeds 
BDRV_BLOCK_DATA.  Why the different choice of bits?


Part of the story is that right now, we document that ONLY the block 
layer sets _ALLOCATED, in io.c, as a result of the driver layer 
returning HOLE || ZERO (there are cases where the block layer can return 
ZERO but not ALLOCATED, because the driver layer returned 0 but the 
block layer still knows that area reads as zero).  So Victor's patch 
matches the fact that the driver shouldn't set ALLOCATED.  Still, if we 
are tying ALLOCATED to whether there is a hole, then that seems like 
information we should be getting from the driver, not something 
synthesized after we've left the driver!


Then there's the question of file-posix.c: what should it return for a 
hole, ZERO|OFFSET_VALID or DATA|ZERO|OFFSET_VALID?  The wording in 
block.h implies that if DATA is not set, then the area reads as zero to 
the guest, but may have indeterminate value on the underlying file - but 
we KNOW 

Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-23 Thread Kevin Wolf
Am 23.02.2018 um 17:43 hat Eric Blake geschrieben:
> > OFFSET_VALID | DATA might be excusable because I can see that it's
> > convenient that a protocol driver refers to itself as *file instead of
> > returning NULL there and then the offset is valid (though it would be
> > pointless to actually follow the file pointer), but OFFSET_VALID without
> > DATA probably isn't.
> 
> So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but
> necessary to avoid breaking qemu-img map output.  But you are also right
> that OFFSET_VALID without data makes little sense at a protocol layer. So
> with that in mind, I'm auditing all of the protocol layers to make sure
> OFFSET_VALID ends up as something sane.

That's one way to look at it.

The other way is that qemu-img map shouldn't ask the protocol layer for
its offset because it already knows the offset (it is what it passes as
a parameter to bdrv_co_block_status).

Anyway, it's probably not worth changing the interface, we should just
make sure that the return values of the individual drivers are
consistent.

Kevin



Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-23 Thread Eric Blake

On 02/14/2018 06:05 AM, Kevin Wolf wrote:


+static int coroutine_fn null_co_block_status(BlockDriverState *bs,



  if (s->read_zeroes) {
-return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO;
-} else {
-return BDRV_BLOCK_OFFSET_VALID | start;
+ret |= BDRV_BLOCK_ZERO;
  }
+return ret;
  }


Preexisting, but I think this return value is wrong. OFFSET_VALID
without DATA is to documented to have the following semantics:

  * DATA ZERO OFFSET_VALID
  *  ftt   sectors preallocated, read as zero, returned file not
  *necessarily zero at offset
  *  fft   sectors preallocated but read from backing_hd,
  *returned file contains garbage at offset

I'm not sure what OFFSET_VALID is even supposed to mean for null.


I'm finally getting around to playing with this.



Or in fact, what it is supposed to mean for any protocol driver, because
normally it just means I can use this offset for accessing bs->file. But > 
protocol drivers don't have a bs->file, so it's interesting to see that
they still all set this flag.


More precisely, it means "I can use this offset for accessing the 
returned *file".  Format and filter drivers set *file = bs->file (ie. 
their protocol layer), but protocol drivers set *file = bs (ie. 
themselves).  As long as you read it as "the offset is valid in the 
returned *file", and are careful as to _which_ BDS gets returned in 
*file*, it can still make sense.


So next I tried playing with a patch, to see how much returning 
OFFSET_VALID with DATA matters; and it turns out is is easily observable 
anywhere that the underlying protocol bleeds through to the format layer 
(particularly the raw format driver):


$ echo abc > tmp
$ truncate --size=10M tmp

pre-patch:
$ ./qemu-img map --output=json tmp
[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": 0},
{ "start": 4096, "length": 10481664, "depth": 0, "zero": true, "data": 
false, "offset": 4096}]


turn off OFFSET_VALID at the protocol layer:
diff --git i/block/file-posix.c w/block/file-posix.c
index f1591c38490..c05992c1121 100644
--- i/block/file-posix.c
+++ w/block/file-posix.c
@@ -2158,9 +2158,7 @@ static int coroutine_fn 
raw_co_block_status(BlockDriverState *bs,


 if (!want_zero) {
 *pnum = bytes;
-*map = offset;
-*file = bs;
-return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+return BDRV_BLOCK_DATA;
 }

 ret = find_allocation(bs, offset, , );
@@ -2183,9 +2181,7 @@ static int coroutine_fn 
raw_co_block_status(BlockDriverState *bs,

 *pnum = MIN(bytes, data - offset);
 ret = BDRV_BLOCK_ZERO;
 }
-*map = offset;
-*file = bs;
-return ret | BDRV_BLOCK_OFFSET_VALID;
+return ret;
 }

 static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs,


post-patch:
$ ./qemu-img map --output=json tmp
[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true},
{ "start": 4096, "length": 10481664, "depth": 0, "zero": true, "data": 
false}]





OFFSET_VALID | DATA might be excusable because I can see that it's
convenient that a protocol driver refers to itself as *file instead of
returning NULL there and then the offset is valid (though it would be
pointless to actually follow the file pointer), but OFFSET_VALID without
DATA probably isn't.


So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but 
necessary to avoid breaking qemu-img map output.  But you are also right 
that OFFSET_VALID without data makes little sense at a protocol layer. 
So with that in mind, I'm auditing all of the protocol layers to make 
sure OFFSET_VALID ends up as something sane.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-14 Thread Kevin Wolf
Am 14.02.2018 um 15:44 hat Eric Blake geschrieben:
> On 02/14/2018 06:05 AM, Kevin Wolf wrote:
> > Am 13.02.2018 um 21:26 hat Eric Blake geschrieben:
> > > We are gradually moving away from sector-based interfaces, towards
> > > byte-based.  Update the null driver accordingly.
> > > 
> > > Signed-off-by: Eric Blake 
> > > Reviewed-by: Vladimir Sementsov-Ogievskiy 
> > > Reviewed-by: Fam Zheng 
> > > 
> 
> > >   if (s->read_zeroes) {
> > > -return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO;
> > > -} else {
> > > -return BDRV_BLOCK_OFFSET_VALID | start;
> > > +ret |= BDRV_BLOCK_ZERO;
> > >   }
> > > +return ret;
> > >   }
> > 
> > Preexisting, but I think this return value is wrong. OFFSET_VALID
> > without DATA is to documented to have the following semantics:
> > 
> >   * DATA ZERO OFFSET_VALID
> >   *  ftt   sectors preallocated, read as zero, returned 
> > file not
> >   *necessarily zero at offset
> >   *  fft   sectors preallocated but read from backing_hd,
> >   *returned file contains garbage at offset
> > 
> > I'm not sure what OFFSET_VALID is even supposed to mean for null.
> 
> Yeah, and I was even thinking about that a bit yesterday when figuring out
> what to do with nvme.  It does highlight the fact that you get garbage when
> reading from the null driver (unless the zero option was enabled, then ZERO
> is set and you know you read zeros instead) - but there no pointer that is
> preallocated (whether it contains garbage or otherwise) that you can
> actually dereference to read what the guest would see.
> 
> > 
> > Or in fact, what it is supposed to mean for any protocol driver, because
> > normally it just means I can use this offset for accessing bs->file. But
> > protocol drivers don't have a bs->file, so it's interesting to see that
> > they still all set this flag.
> > 
> > OFFSET_VALID | DATA might be excusable because I can see that it's
> > convenient that a protocol driver refers to itself as *file instead of
> > returning NULL there and then the offset is valid (though it would be
> > pointless to actually follow the file pointer), but OFFSET_VALID without
> > DATA probably isn't.
> 
> Hmm, you're probably right.  Maybe that means I should tweak the
> documentation to be more explicit: for a format driver, OFFSET_VALID can
> always be used (and *file will be set to the underlying protocol driver);
> but for a protocol driver, OFFSET_VALID only makes sense if *file is the BDS
> itself and there is an actual buffer to read (that is, the protocol driver
> must also be returning DATA and/or ZERO).  Or maybe we can indeed state that
> protocol drivers always set *file to NULL (there is no further backing file
> to reference), and thus never need to return OFFSET_VALID (but I'm not sure
> whether that will accidentally propagate back up the call stack and
> negatively affect status queries of format drivers).
> 
> Since it is pre-existing, should I respin to address the issue in a separate
> patch, or should that be a followup after this series?

It's a more fundamental question that shouldn't hold up this series. I
just wanted to raise it while I was looking at it. So yes, a followup is
fine.

Kevin



Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-14 Thread Eric Blake

On 02/14/2018 06:05 AM, Kevin Wolf wrote:

Am 13.02.2018 um 21:26 hat Eric Blake geschrieben:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the null driver accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 




  if (s->read_zeroes) {
-return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO;
-} else {
-return BDRV_BLOCK_OFFSET_VALID | start;
+ret |= BDRV_BLOCK_ZERO;
  }
+return ret;
  }


Preexisting, but I think this return value is wrong. OFFSET_VALID
without DATA is to documented to have the following semantics:

  * DATA ZERO OFFSET_VALID
  *  ftt   sectors preallocated, read as zero, returned file not
  *necessarily zero at offset
  *  fft   sectors preallocated but read from backing_hd,
  *returned file contains garbage at offset

I'm not sure what OFFSET_VALID is even supposed to mean for null.


Yeah, and I was even thinking about that a bit yesterday when figuring 
out what to do with nvme.  It does highlight the fact that you get 
garbage when reading from the null driver (unless the zero option was 
enabled, then ZERO is set and you know you read zeros instead) - but 
there no pointer that is preallocated (whether it contains garbage or 
otherwise) that you can actually dereference to read what the guest 
would see.




Or in fact, what it is supposed to mean for any protocol driver, because
normally it just means I can use this offset for accessing bs->file. But
protocol drivers don't have a bs->file, so it's interesting to see that
they still all set this flag.

OFFSET_VALID | DATA might be excusable because I can see that it's
convenient that a protocol driver refers to itself as *file instead of
returning NULL there and then the offset is valid (though it would be
pointless to actually follow the file pointer), but OFFSET_VALID without
DATA probably isn't.


Hmm, you're probably right.  Maybe that means I should tweak the 
documentation to be more explicit: for a format driver, OFFSET_VALID can 
always be used (and *file will be set to the underlying protocol 
driver); but for a protocol driver, OFFSET_VALID only makes sense if 
*file is the BDS itself and there is an actual buffer to read (that is, 
the protocol driver must also be returning DATA and/or ZERO).  Or maybe 
we can indeed state that protocol drivers always set *file to NULL 
(there is no further backing file to reference), and thus never need to 
return OFFSET_VALID (but I'm not sure whether that will accidentally 
propagate back up the call stack and negatively affect status queries of 
format drivers).


Since it is pre-existing, should I respin to address the issue in a 
separate patch, or should that be a followup after this series?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



[Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake
We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the null driver accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v6-v7: no change
v5: minor fix to type of 'ret'
v4: rebase to interface tweak
v3: no change
v2: rebase to mapping parameter
---
 block/null.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/block/null.c b/block/null.c
index 214d394fff4..806a8631e4d 100644
--- a/block/null.c
+++ b/block/null.c
@@ -223,22 +223,23 @@ static int null_reopen_prepare(BDRVReopenState 
*reopen_state,
 return 0;
 }

-static int64_t coroutine_fn null_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
- BlockDriverState **file)
+static int coroutine_fn null_co_block_status(BlockDriverState *bs,
+ bool want_zero, int64_t offset,
+ int64_t bytes, int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
 {
 BDRVNullState *s = bs->opaque;
-off_t start = sector_num * BDRV_SECTOR_SIZE;
+int ret = BDRV_BLOCK_OFFSET_VALID;

-*pnum = nb_sectors;
+*pnum = bytes;
+*map = offset;
 *file = bs;

 if (s->read_zeroes) {
-return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO;
-} else {
-return BDRV_BLOCK_OFFSET_VALID | start;
+ret |= BDRV_BLOCK_ZERO;
 }
+return ret;
 }

 static void null_refresh_filename(BlockDriverState *bs, QDict *opts)
@@ -270,7 +271,7 @@ static BlockDriver bdrv_null_co = {
 .bdrv_co_flush_to_disk  = null_co_flush,
 .bdrv_reopen_prepare= null_reopen_prepare,

-.bdrv_co_get_block_status   = null_co_get_block_status,
+.bdrv_co_block_status   = null_co_block_status,

 .bdrv_refresh_filename  = null_refresh_filename,
 };
@@ -290,7 +291,7 @@ static BlockDriver bdrv_null_aio = {
 .bdrv_aio_flush = null_aio_flush,
 .bdrv_reopen_prepare= null_reopen_prepare,

-.bdrv_co_get_block_status   = null_co_get_block_status,
+.bdrv_co_block_status   = null_co_block_status,

 .bdrv_refresh_filename  = null_refresh_filename,
 };
-- 
2.14.3