Re: RFC: Qemu backup interface plans
25.05.2021 11:50, Max Reitz wrote: On 19.05.21 08:11, Vladimir Sementsov-Ogievskiy wrote: 18.05.2021 19:39, Max Reitz wrote: [...] On 17.05.21 14:07, Vladimir Sementsov-Ogievskiy wrote: [...] Not also, that there is another benefit of such thing: we'll implement this callback in qcow2 driver, so that backup will read clusters not in guest cluster order, but in host cluster order, to read more sequentially, which should bring better performance on rotating disks. I’m not exactly sure how you envision this to work, but block_status also already gives you the host offset in *map. But block-status doesn't give a possibility to read sequentially. For this, user should call block-status several times until the whole disk covered, then sort the segments by host offset. I wonder, could it be implemented as some iterator, like read_iter = bdrv_get_sequential_read_iter(source) while (extents = bdrv_read_next(read_iter)): for ext in extents: start_writing_task(target, ext.offset, ext.bytes, ext.qiov) where bdrv_read_next will read guest data in host-cluster-sequence.. How would you implement this, though? I don't know :) That's why I wrote "I wonder".. Anyway I have enough work with all previous steps. qcow2 doesn’t have a reverse mapping either, so it too would need to read all L2 table entries and sort them, wouldn’t it? Hmm, yes. With current qcow2, it seems to be the only way. And we'll be limited by memory, so probably, read several L2 tables, do sort, return sorted extents, read next bunch of L2 tables and so on. And then, I agree, we can just implement with help of current block_status() implementation. Or probably we can implement reverse mapping in qcow2 as extension. But I doubt that it worth the complexity.. Still, it depends on how much extra IO will it cost (almost nothing, as should work through qcow2 cache) and how much performance benefit will it bring (no idea, it should be measured). -- Best regards, Vladimir
Re: RFC: Qemu backup interface plans
On 19.05.21 08:11, Vladimir Sementsov-Ogievskiy wrote: 18.05.2021 19:39, Max Reitz wrote: [...] On 17.05.21 14:07, Vladimir Sementsov-Ogievskiy wrote: [...] Not also, that there is another benefit of such thing: we'll implement this callback in qcow2 driver, so that backup will read clusters not in guest cluster order, but in host cluster order, to read more sequentially, which should bring better performance on rotating disks. I’m not exactly sure how you envision this to work, but block_status also already gives you the host offset in *map. But block-status doesn't give a possibility to read sequentially. For this, user should call block-status several times until the whole disk covered, then sort the segments by host offset. I wonder, could it be implemented as some iterator, like read_iter = bdrv_get_sequential_read_iter(source) while (extents = bdrv_read_next(read_iter)): for ext in extents: start_writing_task(target, ext.offset, ext.bytes, ext.qiov) where bdrv_read_next will read guest data in host-cluster-sequence.. How would you implement this, though? qcow2 doesn’t have a reverse mapping either, so it too would need to read all L2 table entries and sort them, wouldn’t it? Max
Re: RFC: Qemu backup interface plans
17.05.2021 15:07, Vladimir Sementsov-Ogievskiy wrote: 3.1 and make it possible to modify the bitmap externally, so that consumer of fleecing can say to backup-top filter: I've already copied these blocks, don't bother with copying them to temp image". This is to solve [2]. Still, how consumer of fleecing will reset shared bitmap after copying blocks? I have the following idea: we make a "discard-bitmap-filter" filter driver, that own some bitmap and on discard request unset corresponding bits. Also, on read, if read from the region with unset bits the EINVAL returned immediately. This way both consumers (backup job and NBD client) are able to use this interface: Backup job can simply call discard on source, we can add an option for this. External backup tool will send TRIM request after reading some region. This way disk space will be freed and no extra copy-before-write operations will be done. I also have a side idea that we can implement READ_ONCE flag, so that READ and TRIM can be done in one NBD command. But this works only for clients that don't want to implement any kind of retrying. So, finally, how will it look (here I call backup-top with a new name, and "file" child is used instead of "backing", as this change I propose in "[PATCH 00/21] block: publish backup-top filter"): Pull backup: ┌┐ │ NBD export │ └┘ │ │ ┌┐ file ┌───┐ backing ┌─┐ │ discard-bitmap filter (bitmap=bm0) │ ──▶ │ temp qcow2 img │ ─▶ │ active disk │ └┘ └───┘ └─┘ ▲ ▲ │ target │ │ │ ┌┐ ┌───┐ file │ │ guest blk │ ──▶ │ copy-before-write filter (bitmap=bm0) │ ─┘ └┘ └───┘ Now I think that discard-bitmap is not really good idea: 1. If it is separate of internal BlockCopyState::copy_bitmap, than we'll have to consider both bitmaps inside block-copy code, it's not a small complication. 2. Using BlockCopyState::copy_bitmap directly seems possible: User creates copy_bitmap by hand, and pass to copy-before-write filter as an option, block-copy will use this bitmap directly Same bitmap is passed to discard-bitmap filter, so that it can clear bits on discards. Pros: - we don't make block-copy code more complicated Note then, that BlockCopyState::copy_bitmap is used in block-copy process as follows: - copy tasks are created to copy dirty areas - when task is created, corresponding dirty area is cleared (so that creating tasks on same area can't happen) - if task failed, corresponding area becomes dirty again (so that it can be retried) So, we have Cons: - if discard comes when task is in-flight, bits are already clean. Then if task failed bits become dirty again. That's wrong. Not very bad, and not worth doing coplications of [1]. But still, keep it in mind [*] - we have to use separate bitmap in discard-filter to fail reads from non-dirty areas (because BlockCopyState::copy_bitmap is cleared by block-copy process, not only by discards). That is worse.. It just means that discard-filter is strange thing and we don't have good understanding of what should it do. Good documentation will help of course, but... ...this all leads me to new idea: Let's go without discard-bitmap filter with the simple logic: If discard is done on block-copy target, let's inform block-copy about it. So, block-copy can: - clear corresponding area in its internal bitmap - [not really improtant, but it addresses [*]]: cancel in-flight tasks in discarded area. Pros: avoid extra filter -- Best regards, Vladimir
Re: RFC: Qemu backup interface plans
Am 19.05.2021 um 13:49 hat Vladimir Sementsov-Ogievskiy geschrieben: > 19.05.2021 14:20, Kevin Wolf wrote: > > Am 19.05.2021 um 08:11 hat Vladimir Sementsov-Ogievskiy geschrieben: > > > > > 2. Test, that we can start backup job with source = (target of > > > > > backup-top filter), so that we have "push backup with fleecing". > > > > > Make an option for backup to start without a filter, when we don't > > > > > need copy-before-write operations, to not create extra superfluous > > > > > filter. > > > > > > > > OK, so the backup job is not really a backup job, but just anything > > > > that copies data. > > > > > > Not quite. For backup without a filter we should protect source from > > > changing, by unsharing WRITE permission on it. > > > > > > I'll try to avoid adding an option. The logic should work like in > > > commit job: if there are current writers on source we create filter. > > > If there no writers, we just unshare writes and go without a filter. > > > And for this copy-before-write filter should be able to do > > > WRITE_UNCHANGED in case of fleecing. > > > > If we ever get to the point where we would make a block-copy job visible > > to the user, I wouldn't copy the restrictions from the current jobs, but > > keep it really generic to cover all cases. > > > > There is no way for the QMP command starting the job to know what the > > user is planning to do with the image in the future. Even if it's > > currently read-only, the user may want to add a writer later. > > > > I think this means that we want to always add a filter node, and then > > possibly dynamically switch between modes if being read-only provides a > > significant advantage for the job. > > Still, in push-backup-with-fleecing scheme we really don't need the > second filter, so why to insert extra thing to block graph? > > I see your point still, that user may want to add writer later. Still, > I'd be surprised if such use-cases exist now. > > What about the following: > > add some source-mode tri-state parameter for backup: > > auto: insert filter iff there are existing writers [default] > filtered: insert filter unconditionally > immutable: don't insert filter. will fail if there are existing > writers, and creating writers during block-job would be impossible Yes, that's an option, too. Kevin
Re: RFC: Qemu backup interface plans
19.05.2021 14:20, Kevin Wolf wrote: Am 19.05.2021 um 08:11 hat Vladimir Sementsov-Ogievskiy geschrieben: 2. Test, that we can start backup job with source = (target of backup-top filter), so that we have "push backup with fleecing". Make an option for backup to start without a filter, when we don't need copy-before-write operations, to not create extra superfluous filter. OK, so the backup job is not really a backup job, but just anything that copies data. Not quite. For backup without a filter we should protect source from changing, by unsharing WRITE permission on it. I'll try to avoid adding an option. The logic should work like in commit job: if there are current writers on source we create filter. If there no writers, we just unshare writes and go without a filter. And for this copy-before-write filter should be able to do WRITE_UNCHANGED in case of fleecing. If we ever get to the point where we would make a block-copy job visible to the user, I wouldn't copy the restrictions from the current jobs, but keep it really generic to cover all cases. There is no way for the QMP command starting the job to know what the user is planning to do with the image in the future. Even if it's currently read-only, the user may want to add a writer later. I think this means that we want to always add a filter node, and then possibly dynamically switch between modes if being read-only provides a significant advantage for the job. Kevin Still, in push-backup-with-fleecing scheme we really don't need the second filter, so why to insert extra thing to block graph? I see your point still, that user may want to add writer later. Still, I'd be surprised if such use-cases exist now. What about the following: add some source-mode tri-state parameter for backup: auto: insert filter iff there are existing writers [default] filtered: insert filter unconditionally immutable: don't insert filter. will fail if there are existing writers, and creating writers during block-job would be impossible -- Best regards, Vladimir
Re: RFC: Qemu backup interface plans
Am 19.05.2021 um 08:11 hat Vladimir Sementsov-Ogievskiy geschrieben: > > > 2. Test, that we can start backup job with source = (target of > > > backup-top filter), so that we have "push backup with fleecing". > > > Make an option for backup to start without a filter, when we don't > > > need copy-before-write operations, to not create extra superfluous > > > filter. > > > > OK, so the backup job is not really a backup job, but just anything > > that copies data. > > Not quite. For backup without a filter we should protect source from > changing, by unsharing WRITE permission on it. > > I'll try to avoid adding an option. The logic should work like in > commit job: if there are current writers on source we create filter. > If there no writers, we just unshare writes and go without a filter. > And for this copy-before-write filter should be able to do > WRITE_UNCHANGED in case of fleecing. If we ever get to the point where we would make a block-copy job visible to the user, I wouldn't copy the restrictions from the current jobs, but keep it really generic to cover all cases. There is no way for the QMP command starting the job to know what the user is planning to do with the image in the future. Even if it's currently read-only, the user may want to add a writer later. I think this means that we want to always add a filter node, and then possibly dynamically switch between modes if being read-only provides a significant advantage for the job. Kevin
Re: RFC: Qemu backup interface plans
18.05.2021 19:39, Max Reitz wrote: Hi, Your proposal sounds good to me in general. Some small independent building blocks that seems to make sense to me. Thanks! I hope it's not too difficult to read and understand my English. On 17.05.21 14:07, Vladimir Sementsov-Ogievskiy wrote: [...] What we lack in this scheme: 1. handling dirty bitmap in backup-top filter: backup-top does copy-before-write operation on any guest write, when actually we are interested only in "dirty" regions for incremental backup Probable solution would allowing specifying bitmap for sync=none mode of backup, but I think what I propose below is better. 2. [actually it's a tricky part of 1]: possibility to not do copy-before-write operations for regions that was already copied to final backup. With normal Qemu backup job, this is achieved by the fact that block-copy state with its internal bitmap is shared between backup job and copy-before-write filter. 3. Not a real problem but fact: backup block-job does nothing in the scheme, the whole job is done by filter. So, it would be interesting to have a possibility to simply insert/remove the filter, and avoid block-job creation and managing at all for external backup. (and I'd like to send another RFC on how to insert/remove filters, let's not discuss it here). Next. Think about internal backup. It has one drawback too: 4. If target is remote with slow connection, copy-before-write operations will slow down guest writes appreciably. It may be solved with help of image fleecing: we create temporary qcow2 image, setup fleecing scheme, and instead of exporting temp image through NBD we start a second backup with source = temporary image and target would be real backup target (NBD for example). How would a second backup work here? Wouldn’t one want a mirror job to copy the data off to the real target? (Because I see backup as something intrinsically synchronous, whereas mirror by default is rather lazy.) [Note from future me where I read more below: I see you acknowledge that you’ll need to modify backup to do what you need here, i.e. not do any CBW operations. So it’s effectively the same as a mirror that ignores new dirty areas. Which could work without changing mirror if block-copy were to set BDRV_REQ_WRITE_UNCHANGED for the fleecing case, and bdrv_co_write_req_finish() would skip bdrv_set_dirty() for such writes.] I just feel myself closer with backup block-job than with mirror :) Finally, yes, there is no real difference in interface. But in realization, I prefer to continue developing block-copy. I hope, finally all jobs and img-convert would work through block-copy. (and I'll need BDRV_REQ_WRITE_UNCHANGED anyway for fleecing, so user can use mirror or backup) I mean, still has the problem that the mirror job can’t tell the CBW filter which areas are already copied off and so don’t need to be preserved anymore, but... Still, with such solution there are same [1,2] problems, 3 becomes worse: Not sure how 3 can become worse when you said above it isn’t a real problem (to which I agree). It's my perfectionism :) Yes, it's still isn't a problem, but number of extra user-visible objects in architecture increases, which is not good I think. 5. We'll have two jobs and two automatically inserted filters, when actually one filter and one job are enough (as first job is needed only to insert a filter, second job doesn't need a filter at all). Note also, that this (starting two backup jobs to make push backup with fleecing) doesn't work now, op-blockers will be against. It's simple to fix (and in Virtuozzo we live with downstream-only patch, which allows push backup with fleecing, based on starting two backup jobs).. But I never send a patch, as I have better plan, which will solve all listed problems. So, what I propose: 1. We make backup-top filter public, so that it could be inserted/removed where user wants through QMP (how to properly insert/remove filter I'll post another RFC, as backup-top is not the only filter that can be usefully inserted somewhere). For this first step I've sent a series today: subject: [PATCH 00/21] block: publish backup-top filter id: <20210517064428.16223-1-vsement...@virtuozzo.com> patchew: https://patchew.org/QEMU/20210517064428.16223-1-vsement...@virtuozzo.com/ (note, that one of things in this series is rename s/backup-top/copy-before-write/, still, I call it backup-top in this letter) This solves [3]. [4, 5] are solved partly: we still have one extra filter, created by backup block jobs, and also I didn't test does this work, probably some op-blockers or permissions should be tuned. So, let it be step 2: 2. Test, that we can start backup job with source = (target of backup-top filter), so that we have "push backup with fleecing". Make an option for backup to start without a filter, when we don't need copy-before-write operations, to not create extra superfl
Re: RFC: Qemu backup interface plans
Hi, Your proposal sounds good to me in general. Some small independent building blocks that seems to make sense to me. On 17.05.21 14:07, Vladimir Sementsov-Ogievskiy wrote: [...] What we lack in this scheme: 1. handling dirty bitmap in backup-top filter: backup-top does copy-before-write operation on any guest write, when actually we are interested only in "dirty" regions for incremental backup Probable solution would allowing specifying bitmap for sync=none mode of backup, but I think what I propose below is better. 2. [actually it's a tricky part of 1]: possibility to not do copy-before-write operations for regions that was already copied to final backup. With normal Qemu backup job, this is achieved by the fact that block-copy state with its internal bitmap is shared between backup job and copy-before-write filter. 3. Not a real problem but fact: backup block-job does nothing in the scheme, the whole job is done by filter. So, it would be interesting to have a possibility to simply insert/remove the filter, and avoid block-job creation and managing at all for external backup. (and I'd like to send another RFC on how to insert/remove filters, let's not discuss it here). Next. Think about internal backup. It has one drawback too: 4. If target is remote with slow connection, copy-before-write operations will slow down guest writes appreciably. It may be solved with help of image fleecing: we create temporary qcow2 image, setup fleecing scheme, and instead of exporting temp image through NBD we start a second backup with source = temporary image and target would be real backup target (NBD for example). How would a second backup work here? Wouldn’t one want a mirror job to copy the data off to the real target? (Because I see backup as something intrinsically synchronous, whereas mirror by default is rather lazy.) [Note from future me where I read more below: I see you acknowledge that you’ll need to modify backup to do what you need here, i.e. not do any CBW operations. So it’s effectively the same as a mirror that ignores new dirty areas. Which could work without changing mirror if block-copy were to set BDRV_REQ_WRITE_UNCHANGED for the fleecing case, and bdrv_co_write_req_finish() would skip bdrv_set_dirty() for such writes.] I mean, still has the problem that the mirror job can’t tell the CBW filter which areas are already copied off and so don’t need to be preserved anymore, but... Still, with such solution there are same [1,2] problems, 3 becomes worse: Not sure how 3 can become worse when you said above it isn’t a real problem (to which I agree). 5. We'll have two jobs and two automatically inserted filters, when actually one filter and one job are enough (as first job is needed only to insert a filter, second job doesn't need a filter at all). Note also, that this (starting two backup jobs to make push backup with fleecing) doesn't work now, op-blockers will be against. It's simple to fix (and in Virtuozzo we live with downstream-only patch, which allows push backup with fleecing, based on starting two backup jobs).. But I never send a patch, as I have better plan, which will solve all listed problems. So, what I propose: 1. We make backup-top filter public, so that it could be inserted/removed where user wants through QMP (how to properly insert/remove filter I'll post another RFC, as backup-top is not the only filter that can be usefully inserted somewhere). For this first step I've sent a series today: subject: [PATCH 00/21] block: publish backup-top filter id: <20210517064428.16223-1-vsement...@virtuozzo.com> patchew: https://patchew.org/QEMU/20210517064428.16223-1-vsement...@virtuozzo.com/ (note, that one of things in this series is rename s/backup-top/copy-before-write/, still, I call it backup-top in this letter) This solves [3]. [4, 5] are solved partly: we still have one extra filter, created by backup block jobs, and also I didn't test does this work, probably some op-blockers or permissions should be tuned. So, let it be step 2: 2. Test, that we can start backup job with source = (target of backup-top filter), so that we have "push backup with fleecing". Make an option for backup to start without a filter, when we don't need copy-before-write operations, to not create extra superfluous filter. OK, so the backup job is not really a backup job, but just anything that copies data. 3. Support bitmap in backup-top filter, to solve [1] 3.1 and make it possible to modify the bitmap externally, so that consumer of fleecing can say to backup-top filter: I've already copied these blocks, don't bother with copying them to temp image". This is to solve [2]. Still, how consumer of fleecing will reset shared bitmap after copying blocks? I have the following idea: we make a "discard-bitmap-filter" filter driver, that own some bitmap and on discard request unset corresponding bits. A
RFC: Qemu backup interface plans
Hi all! I'd like to share and discuss some plans on Qemu backup interface I have. (actually, most of this I've presented on KVM Forum long ago.. But now I'm a lot closer to realization:) I'd start with a reminder about image fleecing: We have image fleecing scheme to export point-in-time state of active disk (iotest 222): backup(sync=none) ┌───┐ ▼ │ ┌┐ ┌┐ backing ┌─┐ │ NBD export │ ─── │ temp qcow2 img │ ───▶ │ active disk │ └┘ └┘ └─┘ ▲ ┌┐ │ │ guest blk │ ──┘ └┘ Actually, backup job inserts a backup-top filter, so in detail it looks like: backup(sync=none) ┌───┐ ▼ │ ┌┐ ┌┐ backing ┌─┐ │ NBD export │ ─── │ temp qcow2 img │ ───▶ │ active disk │ └┘ └┘ └─┘ ▲ ▲ │ target│ │ │ ┌┐ ┌┐ backing │ │ guest blk │ ──▶ │ backup-top │ ───┘ └┘ └┘ This scheme is also called external backup or pull backup. It allows some external tool to write data to actual backup, and Qemu only provides this data. We support also incremental external backup: Qemu can manage dirty bitmaps in any way user wants, and we can export bitmaps through NBD protocol. So, client of NBD export can get the bitmap, and read only "dirty" regions of exported image. What we lack in this scheme: 1. handling dirty bitmap in backup-top filter: backup-top does copy-before-write operation on any guest write, when actually we are interested only in "dirty" regions for incremental backup Probable solution would allowing specifying bitmap for sync=none mode of backup, but I think what I propose below is better. 2. [actually it's a tricky part of 1]: possibility to not do copy-before-write operations for regions that was already copied to final backup. With normal Qemu backup job, this is achieved by the fact that block-copy state with its internal bitmap is shared between backup job and copy-before-write filter. 3. Not a real problem but fact: backup block-job does nothing in the scheme, the whole job is done by filter. So, it would be interesting to have a possibility to simply insert/remove the filter, and avoid block-job creation and managing at all for external backup. (and I'd like to send another RFC on how to insert/remove filters, let's not discuss it here). Next. Think about internal backup. It has one drawback too: 4. If target is remote with slow connection, copy-before-write operations will slow down guest writes appreciably. It may be solved with help of image fleecing: we create temporary qcow2 image, setup fleecing scheme, and instead of exporting temp image through NBD we start a second backup with source = temporary image and target would be real backup target (NBD for example). Still, with such solution there are same [1,2] problems, 3 becomes worse: 5. We'll have two jobs and two automatically inserted filters, when actually one filter and one job are enough (as first job is needed only to insert a filter, second job doesn't need a filter at all). Note also, that this (starting two backup jobs to make push backup with fleecing) doesn't work now, op-blockers will be against. It's simple to fix (and in Virtuozzo we live with downstream-only patch, which allows push backup with fleecing, based on starting two backup jobs).. But I never send a patch, as I have better plan, which will solve all listed problems. So, what I propose: 1. We make backup-top filter public, so that it could be inserted/removed where user wants through QMP (how to properly insert/remove filter I'll post another RFC, as backup-top is not the only filter that can be usefully inserted somewhere). For this first step I've sent a series today: subject: [PATCH 00/21] block: publish backup-top filter id: <20210517064428.16223-1-vsement...@virtuozzo.com> patchew: https://patchew.org/QEMU/20210517064428.16223-1-vsement...@virtuozzo.com/ (note, that one of things in this series is rename s/backup-top/copy-before-write/, still, I call it backup-top in this letter) This solves [3]. [4, 5] are solved partly: