Am 16.07.2024 um 16:41 hat Andrey Drobyshev geschrieben: > From: "Denis V. Lunev" <d...@openvz.org> > > We have observed that some clusters in the QCOW2 files are zeroed > while preallocation filter is used. > > We are able to trace down the following sequence when prealloc-filter > is used: > co=0x55e7cbed7680 qcow2_co_pwritev_task() > co=0x55e7cbed7680 preallocate_co_pwritev_part() > co=0x55e7cbed7680 handle_write() > co=0x55e7cbed7680 bdrv_co_do_pwrite_zeroes() > co=0x55e7cbed7680 raw_do_pwrite_zeroes() > co=0x7f9edb7fe500 do_fallocate() > > Here coroutine 0x55e7cbed7680 is being blocked waiting while coroutine > 0x7f9edb7fe500 will finish with fallocate of the file area. OK. It is > time to handle next coroutine, which > co=0x55e7cbee91b0 qcow2_co_pwritev_task() > co=0x55e7cbee91b0 preallocate_co_pwritev_part() > co=0x55e7cbee91b0 handle_write() > co=0x55e7cbee91b0 bdrv_co_do_pwrite_zeroes() > co=0x55e7cbee91b0 raw_do_pwrite_zeroes() > co=0x7f9edb7deb00 do_fallocate() > > The trouble comes here. Coroutine 0x55e7cbed7680 has not advanced > file_end yet and coroutine 0x55e7cbee91b0 will start fallocate() for > the same area. This means that if (once fallocate is started inside > 0x7f9edb7deb00) original fallocate could end and the real write will > be executed. In that case write() request is handled at the same time > as fallocate(). > > The patch moves s->file_lock assignment before fallocate and that is
s/file_lock/file_end/? > crucial. The idea is that all subsequent requests into the area > being preallocation will be issued as just writes without fallocate > to this area and they will not proceed thanks to overlapping > requests mechanics. If preallocation will fail, we will just switch > to the normal expand-by-write behavior and that is not a problem > except performance. > > Signed-off-by: Denis V. Lunev <d...@openvz.org> > Tested-by: Andrey Drobyshev <andrey.drobys...@virtuozzo.com> > --- > block/preallocate.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/block/preallocate.c b/block/preallocate.c > index d215bc5d6d..ecf0aa4baa 100644 > --- a/block/preallocate.c > +++ b/block/preallocate.c > @@ -383,6 +383,13 @@ handle_write(BlockDriverState *bs, int64_t offset, > int64_t bytes, > > want_merge_zero = want_merge_zero && (prealloc_start <= offset); > > + /* > + * Assign file_end before making actual preallocation. This will ensure > + * that next request performed while preallocation is in progress will > + * be passed without preallocation. > + */ > + s->file_end = prealloc_end; > + > ret = bdrv_co_pwrite_zeroes( > bs->file, prealloc_start, prealloc_end - prealloc_start, > BDRV_REQ_NO_FALLBACK | BDRV_REQ_SERIALISING | BDRV_REQ_NO_WAIT); > @@ -391,7 +398,6 @@ handle_write(BlockDriverState *bs, int64_t offset, > int64_t bytes, > return false; > } > > - s->file_end = prealloc_end; > return want_merge_zero; > } Until bdrv_co_pwrite_zeroes() completes successfully, the file size is unchanged. In other words, if anything calls preallocate_co_getlength() in the meantime, don't we run into... ret = bdrv_co_getlength(bs->file->bs); if (has_prealloc_perms(bs)) { s->file_end = s->zero_start = s->data_end = ret; } ...and reset s->file_end back to the old value, re-enabling the bug you're trying to fix here? Or do we know that no such code path can be called concurrently for some reason? Kevin