On 26.08.20 08:49, Vladimir Sementsov-Ogievskiy wrote: > 25.08.2020 18:11, Max Reitz wrote: >> On 21.08.20 16:11, Vladimir Sementsov-Ogievskiy wrote: >>> It's intended to be inserted between format and protocol nodes to >>> preallocate additional space (expanding protocol file) on writes >>> crossing EOF. It improves performance for file-systems with slow >>> allocation. >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> >>> --- >>> docs/system/qemu-block-drivers.rst.inc | 26 +++ >>> qapi/block-core.json | 20 +- >>> block/preallocate.c | 291 +++++++++++++++++++++++++ >>> block/Makefile.objs | 1 + >>> 4 files changed, 337 insertions(+), 1 deletion(-) >>> create mode 100644 block/preallocate.c
[...] >>> diff --git a/block/preallocate.c b/block/preallocate.c >>> new file mode 100644 >>> index 0000000000..bdf54dbd2f >>> --- /dev/null >>> +++ b/block/preallocate.c >>> @@ -0,0 +1,291 @@ >>> +/* >>> + * preallocate filter driver >>> + * >>> + * The driver performs preallocate operation: it is injected above >>> + * some node, and before each write over EOF it does additional >>> preallocating >>> + * write-zeroes request. >>> + * >>> + * Copyright (c) 2020 Virtuozzo International GmbH. >>> + * >>> + * Author: >>> + * Sementsov-Ogievskiy Vladimir <vsement...@virtuozzo.com> >>> + * >>> + * This program is free software; you can redistribute it and/or modify >>> + * it under the terms of the GNU General Public License as published by >>> + * the Free Software Foundation; either version 2 of the License, or >>> + * (at your option) any later version. >>> + * >>> + * This program is distributed in the hope that it will be useful, >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>> + * GNU General Public License for more details. >>> + * >>> + * You should have received a copy of the GNU General Public License >>> + * along with this program. If not, see <http://www.gnu.org/licenses/>. >>> + */ >>> + >>> +#include "qemu/osdep.h" >>> + >>> +#include "qapi/error.h" >>> +#include "qemu/module.h" >>> +#include "qemu/option.h" >>> +#include "qemu/units.h" >>> +#include "block/block_int.h" >>> + >>> + >>> +typedef struct BDRVPreallocateState { >>> + int64_t prealloc_size; >>> + int64_t prealloc_align; >>> + >>> + /* >>> + * Filter is started as not-active, so it doesn't do any >>> preallocations nor >>> + * requires BLK_PERM_RESIZE on its child. This is needed to >>> create filter >>> + * above another node-child and then do bdrv_replace_node to >>> insert the >>> + * filter. >> >> A bit weird, but seems fair. It’s basically just a cache for whether >> this node has a writer on it or not. >> >> Apart from the weirdness, I don’t understand the “another node-child”. >> Say you have “format” -> “proto”, and then you want to insert >> “prealloc”. Wouldn’t you blockdev-add prealloc,file=proto and then >> blockdev-replace format.file=prealloc? > > Yes something like this. Actually, I'm about inserting the filter > automatically from block layer code, but your reasoning is about same > thing and is better. > >> So what is that “other node-child”? > > Hmm, just my misleading wording. At least '-' in wrong place. Just > "other node child", or "child of another node".. OK. >>> + * >>> + * Filter becomes active the first time it detects that its >>> parents have >>> + * BLK_PERM_RESIZE on it. >>> + * Filter becomes active forever: it doesn't lose active status >>> if parents >>> + * lose BLK_PERM_RESIZE, otherwise we'll not be able to shrink >>> the file on >>> + * filter close. >> >> Oh, the file is shrunk? That wasn’t clear from the documentation. >> >> Hm. Seems like useful behavior. I just wonder if keeping the >> permission around indefinitely makes sense. Why not just truncate it >> when the permission is revoked? > > How? Parent is closed earlier, so on close we will not have the > permission. So, we force-keep the permission up to our close(). I thought that preallocate_child_perm() would be invoked when the parent is detached, so we could do the truncate there, before relinquishing preallocate’s RESIZE permission. [...] >>> +static void preallocate_close(BlockDriverState *bs) >>> +{ >>> + BDRVPreallocateState *s = bs->opaque; >>> + >>> + if (s->active && s->data_end >= 0 && >>> + bdrv_getlength(bs->file->bs) > s->data_end) >>> + { >>> + bdrv_truncate(bs->file, s->data_end, true, >>> PREALLOC_MODE_OFF, 0, NULL); >> >> Now that I think more about it... What if there are other writers on >> bs->file? This may throw data away. > > Good point. Actually, if bs->file has other writing parents, the logic > of the filter > around "data_end" is broken. So we must unshare WRITE and RESIZE > permissions. That’s certainly a heavy hammer, but it’d work. >> Maybe BDS.wr_highest_offset can >> help to make a better call? > > Anyway, we'll have to use wr_highest_offset of the filter not the child > node > (in the child wr_highest_offset will track preallocations as well), That’s true. > so we want to unshare WRITE/RESIZE permissions. [...] >>> +static int coroutine_fn >>> +preallocate_co_truncate(BlockDriverState *bs, int64_t offset, >>> + bool exact, PreallocMode prealloc, >>> + BdrvRequestFlags flags, Error **errp) >>> +{ >>> + BDRVPreallocateState *s = bs->opaque; >>> + int ret = bdrv_co_truncate(bs->file, offset, exact, prealloc, >>> flags, errp); >>> + >>> + /* s->data_end may become negative here, which means unknown >>> data end */ >>> + s->data_end = bdrv_getlength(bs->file->bs); >> >> What would be the problem with just setting data_end = offset? We only >> use data_end to truncate down on close, so basically repeat the >> bdrv_co_truncate() call here, which seems correct. > > We can use offset only if ret >= 0 && exact is true.. Well, on close, we ignore any errors from truncate() anyway. So even if we used exact=false here, it shouldn’t matter. > But simpler is > just call > bdrv_getlength() anyway. Certainly simpler than thinking about potential edge cases, true. Max
signature.asc
Description: OpenPGP digital signature