Re: [Qemu-devel] [PATCH 1/2] block/file-posix: Unaligned O_DIRECT block-status
On 14.05.19 23:50, Eric Blake wrote: > On 5/14/19 4:42 PM, Max Reitz wrote: >> Currently, qemu crashes whenever someone queries the block status of an >> unaligned image tail of an O_DIRECT image: >> $ echo > foo >> $ qemu-img map --image-opts driver=file,filename=foo,cache.direct=on >> Offset Length Mapped to File >> qemu-img: block/io.c:2093: bdrv_co_block_status: Assertion `*pnum && >> QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset' >> failed. >> >> This is because bdrv_co_block_status() checks that the result returned >> by the driver's implementation is aligned to the request_alignment, but >> file-posix can fail to do so, which is actually mentioned in a comment >> there: "[...] possibly including a partial sector at EOF". >> >> Fix this by rounding up those partial sectors. >> >> There are two possible alternative fixes: >> (1) We could refuse to open unaligned image files with O_DIRECT >> altogether. That sounds reasonable until you realize that qcow2 >> does necessarily not fill up its metadata clusters, and that nobody >> runs qemu-img create with O_DIRECT. Therefore, unpreallocated qcow2 >> files usually have an unaligned image tail. > > Yep, non-starter. > >> >> (2) bdrv_co_block_status() could ignore unaligned tails. It actually >> throws away everything past the EOF already, so that sounds >> reasonable. >> Unfortunately, the block layer knows file lengths only with a >> granularity of BDRV_SECTOR_SIZE, so bdrv_co_block_status() usually >> would have to guess whether its file length information is inexact >> or whether the driver is broken. > > Well, if I ever get around to my thread of making the block layer honor > byte-accurate sizes, instead of rounding up, then there is no longer > than inexactness. I think our mails crossed, and you missed another idea > of mine of having block drivers (probably only file-posix, per your > audit) set BDRV_BLOCK_EOF when returning an unaligned answer due to EOF, > as the key for letting the block layer know whether the unaligned answer > was due to size rounding. Yes, that EOF change makes sense, I think. Not least because right now the EOF detection in block/io.c has to be a bit wonky considering that it's inexact... But to be honest, returning the EOF flag from the drivers would have required me to modify all drivers. I felt like maybe that something to be left for another time. :-) OTOH, I don’t know whether returning the EOF flag from the drivers would still sense if we had a byte-accurate bdrv_getlength()... >> Fixing what raw_co_block_status() returns is the safest thing to do. > > Agree. > >> >> There seems to be no other block driver that sets request_alignment and >> does not make sure that it always returns aligned values. > > Thanks for auditing. > >> >> Cc: qemu-sta...@nongnu.org >> Signed-off-by: Max Reitz >> --- >> block/file-posix.c | 17 + >> 1 file changed, 17 insertions(+) >> >> diff --git a/block/file-posix.c b/block/file-posix.c >> index e09e15bbf8..f489a5420c 100644 >> --- a/block/file-posix.c >> +++ b/block/file-posix.c >> @@ -2488,6 +2488,9 @@ static int coroutine_fn >> raw_co_block_status(BlockDriverState *bs, >> off_t data = 0, hole = 0; >> int ret; >> >> +assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment) && >> + QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment)); >> + > > Can write in one line as: > > assert(QEMU_IS_ALIGNED(offset | bytes, bs->bl.request_alignment)); Ah, yeah, sure, why not. >> ret = fd_open(bs); >> if (ret < 0) { >> return ret; >> @@ -2513,6 +2516,20 @@ static int coroutine_fn >> raw_co_block_status(BlockDriverState *bs, >> /* On a data extent, compute bytes to the end of the extent, >> * possibly including a partial sector at EOF. */ >> *pnum = MIN(bytes, hole - offset); >> + >> +/* >> + * We are not allowed to return partial sectors, though, so >> + * round up if necessary. >> + */ >> +if (!QEMU_IS_ALIGNED(*pnum, bs->bl.request_alignment)) { >> +int64_t file_length = raw_getlength(bs); >> +if (file_length > 0) { >> +/* Ignore errors, this is just a safeguard */ >> +assert(hole == file_length); >> +} >> +*pnum = ROUND_UP(*pnum, bs->bl.request_alignment); >> +} > > Reviewed-by: Eric Blake Thanks! I'll send a v2 with shorter assert(). Max > bl.request_alignment is normally 1 (making this a no-op), but is > definitely larger for O_DIRECT images (where rounding up and treating > the post-EOF hole the same as the rest of the sector is the same thing > that NBD chose to do). signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 1/2] block/file-posix: Unaligned O_DIRECT block-status
On 5/14/19 4:42 PM, Max Reitz wrote: > Currently, qemu crashes whenever someone queries the block status of an > unaligned image tail of an O_DIRECT image: > $ echo > foo > $ qemu-img map --image-opts driver=file,filename=foo,cache.direct=on > Offset Length Mapped to File > qemu-img: block/io.c:2093: bdrv_co_block_status: Assertion `*pnum && > QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset' > failed. > > This is because bdrv_co_block_status() checks that the result returned > by the driver's implementation is aligned to the request_alignment, but > file-posix can fail to do so, which is actually mentioned in a comment > there: "[...] possibly including a partial sector at EOF". > > Fix this by rounding up those partial sectors. > > There are two possible alternative fixes: > (1) We could refuse to open unaligned image files with O_DIRECT > altogether. That sounds reasonable until you realize that qcow2 > does necessarily not fill up its metadata clusters, and that nobody > runs qemu-img create with O_DIRECT. Therefore, unpreallocated qcow2 > files usually have an unaligned image tail. Yep, non-starter. > > (2) bdrv_co_block_status() could ignore unaligned tails. It actually > throws away everything past the EOF already, so that sounds > reasonable. > Unfortunately, the block layer knows file lengths only with a > granularity of BDRV_SECTOR_SIZE, so bdrv_co_block_status() usually > would have to guess whether its file length information is inexact > or whether the driver is broken. Well, if I ever get around to my thread of making the block layer honor byte-accurate sizes, instead of rounding up, then there is no longer than inexactness. I think our mails crossed, and you missed another idea of mine of having block drivers (probably only file-posix, per your audit) set BDRV_BLOCK_EOF when returning an unaligned answer due to EOF, as the key for letting the block layer know whether the unaligned answer was due to size rounding. > > Fixing what raw_co_block_status() returns is the safest thing to do. Agree. > > There seems to be no other block driver that sets request_alignment and > does not make sure that it always returns aligned values. Thanks for auditing. > > Cc: qemu-sta...@nongnu.org > Signed-off-by: Max Reitz > --- > block/file-posix.c | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/block/file-posix.c b/block/file-posix.c > index e09e15bbf8..f489a5420c 100644 > --- a/block/file-posix.c > +++ b/block/file-posix.c > @@ -2488,6 +2488,9 @@ static int coroutine_fn > raw_co_block_status(BlockDriverState *bs, > off_t data = 0, hole = 0; > int ret; > > +assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment) && > + QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment)); > + Can write in one line as: assert(QEMU_IS_ALIGNED(offset | bytes, bs->bl.request_alignment)); > ret = fd_open(bs); > if (ret < 0) { > return ret; > @@ -2513,6 +2516,20 @@ static int coroutine_fn > raw_co_block_status(BlockDriverState *bs, > /* On a data extent, compute bytes to the end of the extent, > * possibly including a partial sector at EOF. */ > *pnum = MIN(bytes, hole - offset); > + > +/* > + * We are not allowed to return partial sectors, though, so > + * round up if necessary. > + */ > +if (!QEMU_IS_ALIGNED(*pnum, bs->bl.request_alignment)) { > +int64_t file_length = raw_getlength(bs); > +if (file_length > 0) { > +/* Ignore errors, this is just a safeguard */ > +assert(hole == file_length); > +} > +*pnum = ROUND_UP(*pnum, bs->bl.request_alignment); > +} Reviewed-by: Eric Blake bl.request_alignment is normally 1 (making this a no-op), but is definitely larger for O_DIRECT images (where rounding up and treating the post-EOF hole the same as the rest of the sector is the same thing that NBD chose to do). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH 1/2] block/file-posix: Unaligned O_DIRECT block-status
Currently, qemu crashes whenever someone queries the block status of an unaligned image tail of an O_DIRECT image: $ echo > foo $ qemu-img map --image-opts driver=file,filename=foo,cache.direct=on Offset Length Mapped to File qemu-img: block/io.c:2093: bdrv_co_block_status: Assertion `*pnum && QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset' failed. This is because bdrv_co_block_status() checks that the result returned by the driver's implementation is aligned to the request_alignment, but file-posix can fail to do so, which is actually mentioned in a comment there: "[...] possibly including a partial sector at EOF". Fix this by rounding up those partial sectors. There are two possible alternative fixes: (1) We could refuse to open unaligned image files with O_DIRECT altogether. That sounds reasonable until you realize that qcow2 does necessarily not fill up its metadata clusters, and that nobody runs qemu-img create with O_DIRECT. Therefore, unpreallocated qcow2 files usually have an unaligned image tail. (2) bdrv_co_block_status() could ignore unaligned tails. It actually throws away everything past the EOF already, so that sounds reasonable. Unfortunately, the block layer knows file lengths only with a granularity of BDRV_SECTOR_SIZE, so bdrv_co_block_status() usually would have to guess whether its file length information is inexact or whether the driver is broken. Fixing what raw_co_block_status() returns is the safest thing to do. There seems to be no other block driver that sets request_alignment and does not make sure that it always returns aligned values. Cc: qemu-sta...@nongnu.org Signed-off-by: Max Reitz --- block/file-posix.c | 17 + 1 file changed, 17 insertions(+) diff --git a/block/file-posix.c b/block/file-posix.c index e09e15bbf8..f489a5420c 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -2488,6 +2488,9 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs, off_t data = 0, hole = 0; int ret; +assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment) && + QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment)); + ret = fd_open(bs); if (ret < 0) { return ret; @@ -2513,6 +2516,20 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs, /* On a data extent, compute bytes to the end of the extent, * possibly including a partial sector at EOF. */ *pnum = MIN(bytes, hole - offset); + +/* + * We are not allowed to return partial sectors, though, so + * round up if necessary. + */ +if (!QEMU_IS_ALIGNED(*pnum, bs->bl.request_alignment)) { +int64_t file_length = raw_getlength(bs); +if (file_length > 0) { +/* Ignore errors, this is just a safeguard */ +assert(hole == file_length); +} +*pnum = ROUND_UP(*pnum, bs->bl.request_alignment); +} + ret = BDRV_BLOCK_DATA; } else { /* On a hole, compute bytes to the beginning of the next extent. */ -- 2.21.0