Re: [Qemu-devel] [PATCH 1/2] block/file-posix: Unaligned O_DIRECT block-status

2019-05-14 Thread Max Reitz
On 14.05.19 23:50, Eric Blake wrote:
> On 5/14/19 4:42 PM, Max Reitz wrote:
>> Currently, qemu crashes whenever someone queries the block status of an
>> unaligned image tail of an O_DIRECT image:
>> $ echo > foo
>> $ qemu-img map --image-opts driver=file,filename=foo,cache.direct=on
>> Offset  Length  Mapped to   File
>> qemu-img: block/io.c:2093: bdrv_co_block_status: Assertion `*pnum &&
>> QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset'
>> failed.
>>
>> This is because bdrv_co_block_status() checks that the result returned
>> by the driver's implementation is aligned to the request_alignment, but
>> file-posix can fail to do so, which is actually mentioned in a comment
>> there: "[...] possibly including a partial sector at EOF".
>>
>> Fix this by rounding up those partial sectors.
>>
>> There are two possible alternative fixes:
>> (1) We could refuse to open unaligned image files with O_DIRECT
>> altogether.  That sounds reasonable until you realize that qcow2
>> does necessarily not fill up its metadata clusters, and that nobody
>> runs qemu-img create with O_DIRECT.  Therefore, unpreallocated qcow2
>> files usually have an unaligned image tail.
> 
> Yep, non-starter.
> 
>>
>> (2) bdrv_co_block_status() could ignore unaligned tails.  It actually
>> throws away everything past the EOF already, so that sounds
>> reasonable.
>> Unfortunately, the block layer knows file lengths only with a
>> granularity of BDRV_SECTOR_SIZE, so bdrv_co_block_status() usually
>> would have to guess whether its file length information is inexact
>> or whether the driver is broken.
> 
> Well, if I ever get around to my thread of making the block layer honor
> byte-accurate sizes, instead of rounding up, then there is no longer
> than inexactness. I think our mails crossed, and you missed another idea
> of mine of having block drivers (probably only file-posix, per your
> audit) set BDRV_BLOCK_EOF when returning an unaligned answer due to EOF,
> as the key for letting the block layer know whether the unaligned answer
> was due to size rounding.

Yes, that EOF change makes sense, I think.  Not least because right now
the EOF detection in block/io.c has to be a bit wonky considering that
it's inexact...  But to be honest, returning the EOF flag from the
drivers would have required me to modify all drivers.  I felt like maybe
that something to be left for another time. :-)

OTOH, I don’t know whether returning the EOF flag from the drivers would
still sense if we had a byte-accurate bdrv_getlength()...

>> Fixing what raw_co_block_status() returns is the safest thing to do.
> 
> Agree.
> 
>>
>> There seems to be no other block driver that sets request_alignment and
>> does not make sure that it always returns aligned values.
> 
> Thanks for auditing.
> 
>>
>> Cc: qemu-sta...@nongnu.org
>> Signed-off-by: Max Reitz 
>> ---
>>  block/file-posix.c | 17 +
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index e09e15bbf8..f489a5420c 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -2488,6 +2488,9 @@ static int coroutine_fn 
>> raw_co_block_status(BlockDriverState *bs,
>>  off_t data = 0, hole = 0;
>>  int ret;
>>  
>> +assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment) &&
>> +   QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
>> +
> 
> Can write in one line as:
> 
> assert(QEMU_IS_ALIGNED(offset | bytes, bs->bl.request_alignment));

Ah, yeah, sure, why not.

>>  ret = fd_open(bs);
>>  if (ret < 0) {
>>  return ret;
>> @@ -2513,6 +2516,20 @@ static int coroutine_fn 
>> raw_co_block_status(BlockDriverState *bs,
>>  /* On a data extent, compute bytes to the end of the extent,
>>   * possibly including a partial sector at EOF. */
>>  *pnum = MIN(bytes, hole - offset);
>> +
>> +/*
>> + * We are not allowed to return partial sectors, though, so
>> + * round up if necessary.
>> + */
>> +if (!QEMU_IS_ALIGNED(*pnum, bs->bl.request_alignment)) {
>> +int64_t file_length = raw_getlength(bs);
>> +if (file_length > 0) {
>> +/* Ignore errors, this is just a safeguard */
>> +assert(hole == file_length);
>> +}
>> +*pnum = ROUND_UP(*pnum, bs->bl.request_alignment);
>> +}
> 
> Reviewed-by: Eric Blake 

Thanks!

I'll send a v2 with shorter assert().

Max

> bl.request_alignment is normally 1 (making this a no-op), but is
> definitely larger for O_DIRECT images (where rounding up and treating
> the post-EOF hole the same as the rest of the sector is the same thing
> that NBD chose to do).



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 1/2] block/file-posix: Unaligned O_DIRECT block-status

2019-05-14 Thread Eric Blake
On 5/14/19 4:42 PM, Max Reitz wrote:
> Currently, qemu crashes whenever someone queries the block status of an
> unaligned image tail of an O_DIRECT image:
> $ echo > foo
> $ qemu-img map --image-opts driver=file,filename=foo,cache.direct=on
> Offset  Length  Mapped to   File
> qemu-img: block/io.c:2093: bdrv_co_block_status: Assertion `*pnum &&
> QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset'
> failed.
> 
> This is because bdrv_co_block_status() checks that the result returned
> by the driver's implementation is aligned to the request_alignment, but
> file-posix can fail to do so, which is actually mentioned in a comment
> there: "[...] possibly including a partial sector at EOF".
> 
> Fix this by rounding up those partial sectors.
> 
> There are two possible alternative fixes:
> (1) We could refuse to open unaligned image files with O_DIRECT
> altogether.  That sounds reasonable until you realize that qcow2
> does necessarily not fill up its metadata clusters, and that nobody
> runs qemu-img create with O_DIRECT.  Therefore, unpreallocated qcow2
> files usually have an unaligned image tail.

Yep, non-starter.

> 
> (2) bdrv_co_block_status() could ignore unaligned tails.  It actually
> throws away everything past the EOF already, so that sounds
> reasonable.
> Unfortunately, the block layer knows file lengths only with a
> granularity of BDRV_SECTOR_SIZE, so bdrv_co_block_status() usually
> would have to guess whether its file length information is inexact
> or whether the driver is broken.

Well, if I ever get around to my thread of making the block layer honor
byte-accurate sizes, instead of rounding up, then there is no longer
than inexactness. I think our mails crossed, and you missed another idea
of mine of having block drivers (probably only file-posix, per your
audit) set BDRV_BLOCK_EOF when returning an unaligned answer due to EOF,
as the key for letting the block layer know whether the unaligned answer
was due to size rounding.

> 
> Fixing what raw_co_block_status() returns is the safest thing to do.

Agree.

> 
> There seems to be no other block driver that sets request_alignment and
> does not make sure that it always returns aligned values.

Thanks for auditing.

> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Max Reitz 
> ---
>  block/file-posix.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index e09e15bbf8..f489a5420c 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2488,6 +2488,9 @@ static int coroutine_fn 
> raw_co_block_status(BlockDriverState *bs,
>  off_t data = 0, hole = 0;
>  int ret;
>  
> +assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment) &&
> +   QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
> +

Can write in one line as:

assert(QEMU_IS_ALIGNED(offset | bytes, bs->bl.request_alignment));

>  ret = fd_open(bs);
>  if (ret < 0) {
>  return ret;
> @@ -2513,6 +2516,20 @@ static int coroutine_fn 
> raw_co_block_status(BlockDriverState *bs,
>  /* On a data extent, compute bytes to the end of the extent,
>   * possibly including a partial sector at EOF. */
>  *pnum = MIN(bytes, hole - offset);
> +
> +/*
> + * We are not allowed to return partial sectors, though, so
> + * round up if necessary.
> + */
> +if (!QEMU_IS_ALIGNED(*pnum, bs->bl.request_alignment)) {
> +int64_t file_length = raw_getlength(bs);
> +if (file_length > 0) {
> +/* Ignore errors, this is just a safeguard */
> +assert(hole == file_length);
> +}
> +*pnum = ROUND_UP(*pnum, bs->bl.request_alignment);
> +}

Reviewed-by: Eric Blake 

bl.request_alignment is normally 1 (making this a no-op), but is
definitely larger for O_DIRECT images (where rounding up and treating
the post-EOF hole the same as the rest of the sector is the same thing
that NBD chose to do).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH 1/2] block/file-posix: Unaligned O_DIRECT block-status

2019-05-14 Thread Max Reitz
Currently, qemu crashes whenever someone queries the block status of an
unaligned image tail of an O_DIRECT image:
$ echo > foo
$ qemu-img map --image-opts driver=file,filename=foo,cache.direct=on
Offset  Length  Mapped to   File
qemu-img: block/io.c:2093: bdrv_co_block_status: Assertion `*pnum &&
QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset'
failed.

This is because bdrv_co_block_status() checks that the result returned
by the driver's implementation is aligned to the request_alignment, but
file-posix can fail to do so, which is actually mentioned in a comment
there: "[...] possibly including a partial sector at EOF".

Fix this by rounding up those partial sectors.

There are two possible alternative fixes:
(1) We could refuse to open unaligned image files with O_DIRECT
altogether.  That sounds reasonable until you realize that qcow2
does necessarily not fill up its metadata clusters, and that nobody
runs qemu-img create with O_DIRECT.  Therefore, unpreallocated qcow2
files usually have an unaligned image tail.

(2) bdrv_co_block_status() could ignore unaligned tails.  It actually
throws away everything past the EOF already, so that sounds
reasonable.
Unfortunately, the block layer knows file lengths only with a
granularity of BDRV_SECTOR_SIZE, so bdrv_co_block_status() usually
would have to guess whether its file length information is inexact
or whether the driver is broken.

Fixing what raw_co_block_status() returns is the safest thing to do.

There seems to be no other block driver that sets request_alignment and
does not make sure that it always returns aligned values.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Max Reitz 
---
 block/file-posix.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index e09e15bbf8..f489a5420c 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2488,6 +2488,9 @@ static int coroutine_fn 
raw_co_block_status(BlockDriverState *bs,
 off_t data = 0, hole = 0;
 int ret;
 
+assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment) &&
+   QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
+
 ret = fd_open(bs);
 if (ret < 0) {
 return ret;
@@ -2513,6 +2516,20 @@ static int coroutine_fn 
raw_co_block_status(BlockDriverState *bs,
 /* On a data extent, compute bytes to the end of the extent,
  * possibly including a partial sector at EOF. */
 *pnum = MIN(bytes, hole - offset);
+
+/*
+ * We are not allowed to return partial sectors, though, so
+ * round up if necessary.
+ */
+if (!QEMU_IS_ALIGNED(*pnum, bs->bl.request_alignment)) {
+int64_t file_length = raw_getlength(bs);
+if (file_length > 0) {
+/* Ignore errors, this is just a safeguard */
+assert(hole == file_length);
+}
+*pnum = ROUND_UP(*pnum, bs->bl.request_alignment);
+}
+
 ret = BDRV_BLOCK_DATA;
 } else {
 /* On a hole, compute bytes to the beginning of the next extent.  */
-- 
2.21.0