08.06.2020 12:38, Peter Krempa wrote:
On Sat, Jun 06, 2020 at 09:55:13 +0300, Vladimir Sementsov-Ogievskiy wrote:
05.06.2020 13:59, Peter Krempa wrote:
On Fri, Jun 05, 2020 at 12:07:47 +0200, Kevin Wolf wrote:
Am 05.06.2020 um 11:58 hat Peter Krempa geschrieben:
On Fri, Jun 05, 2020 at 11:44:07 +0200, Kevin Wolf wrote:


The above was actually inspired by a very recent problem I have in my
attempt to use the dirty bitmap populate job to refactor how libvirt
handles bitmaps. I've just figured out that I need to shuffle around
some stuff as I can't run the dirty-bitmap-populate job while an active
layer commit is in synchronised phase and I wanted to do the merging at
that point. That reminded me of a possible gotcha in having to sequence
the blockjobs which certainly would be more painful.

It would probably be good to have not only an iotests case that tests
the low-level functionality of the block job, but also one that
resembles the way libvirt would actually use it in combination with
other jobs.

Hi! Sorry me missing the discussion for a long time.

About new job semantics: if you create temporary bitmaps anyway, I do think 
that we should allow to populate into target bitmap directly, without creating 
any internal temporary bitmaps. I suggested it when reviewing v1, John argued 
for more transaction-like semantics to look like other jobs. Still, we can 
support both modes if we want.

Allowing to use one target for several populating job is an interesting idea. Current 
series does "bdrv_dirty_bitmap_set_busy(target_bitmap, true);", which forbids 
it.. Hmm. If we just drop it, nothing prevents user just remove target bitmap during the 
job. So, we'll need something like collective-busy bitmap..

I certainly can document the way we'll use it but that in turn depends
on how the job behaves.

With the current state of the job I plan to use it in two scenarios:

Preface: I'm currently changing libvirt to use one active bitmap per
checkpoint (checkpoint is name for the point in time we want to take
backup from). This means that a layer of the backing chain can have
multiple active bitmaps depending on how many checkpoints were created
in the current top layer. (previously we've tried to optimize things by
having just one bitmap active, but the semantics were getting too crazy
to be maintainable long-term)

Hmm. I had a plan of creating "lazy" disabled bitmaps, to optimize scenario with one 
active bitmap, so that disabled bitmaps are not loaded into RAM on start, but only on demand. But 
how to do it with "many active bitmaps" scenario? I don't think that's a good idea.. 
Possibly, we can implement laziness by internally make only one active bitmap and merge it here and 
there when you request some active bitmap which we actually didn't load yet..

Could you describe, what is the exact problem with "several disabled - one active" 
scheme, and how is it solved by "several active"?

The 'several disabled one active' semantics _heavily_ depend on metadata
which must be tracked outside of qemu and is more prone to break. If any
of the intermediate bitmaps is broken or missing everything breaks.

Then there's the complexity of the code which handles merging of the
bitmaps during block jobs. Jobs such as blockdev-mirror in full mode and
block-commit squash together the data and we need to do something about
the bitmaps for the backups to work properly afterwards.

Without considering overlays which were created without propagating
bitmaps, the code was already getting hairy especially in the case of
backups where we needed to stitch together bitmaps for all the bitmaps
corresponding to the given point in time where the backup is taken from.

When we add overlays without any bitmaps into the mix the code for
resolving which bitmaps to merge the code is becoming very unpleasant,
hard to understand and maintain and that is the main point for the

I don't want to add unnecessary complexity to the libvirt code which
will make it more fragile or hard to understand and fix in the future.

Both points which I heard for now (performance, and backup granularity
in case of non-default qcow2 block size) don't seem compelling enough to
me to make my life of implementing the feature in libvirt so much

Also users really can just remove the point in time they wish to backup
from after a successful backup which will also remove the corresponding
active bitmap.

That's all is reasonable enough..

Than, we really need to refactor the code around bitmap support. Currently we
do keep all active bitmaps in RAM and update them in a loop on each write.
But it's obvious, the we can keep only one (with smallest granularity of 
dirty bitmaps) to track guest writes.

Bitmaps are no longer propagated over to upper layers when creating
snapshots as we can use block-dirty-bitmap-populate instead.

Unexpected turn. When all this topic only started, it was reasoned more like "if 
user forget to create bitmap at start, let's help him".. But now it becomes the 
common scenario. Hmm.

It's not only a "user forgot" thing, but more that a systemic change
would be required.

Additionally until _very_ recently it wasn't possible to create bitmaps
using qemu-img, which made it impossible to create overlays for inactive

Didn't you consider to use qemu started in stopped mode to do block
operations in same manner as for running vm? What's wrong with it?
Also, there is qemu-storage-daemon, which is developed as separated
binary, where block-layer is compiled in together with QMP interface.

VMs. Arguably this has changed so we could require it. It still adds a
moving part which can break if the user doesn't add the bitmap or
requires yet another special case handling if we want to compensate for

As of such, in libvirt's tech-preview implementation that is present
currently upstream, if you create a qcow2 overlay without adding the
appropriate bitmaps, the backups simply won't work.

What do you think of granularity? We in Virtuozzo use 1M cluster as a default 
for qcow2 images. But we use 64k granularity (default) for bitmaps, to have 
smaller incremental backups. So, this is advantage of creating bitmap over 
relaying on block-status capturing by block-dirty-bitmap-populate: you don't 
control dirtiness granularity. So, I think that bitmap propagation, or just 
creating new dirty bitmap to track dirtiness from start of new snapshot is 

This is a valid argument. Backups in this scenario will be bigger. I
still don't feel like the code needs to be made much more complex
because of it though.

May be, there is a simple solution? an option for blockdev-snapshot-sync to 
create a bitmap in a new image (or if you create image by qemu-img, just create 
bitmap by qemu-img as well, using new functionality).

Isn't it simpler than to just use existing block-status-bitmap, than run a job?

1) backup

Prior to doing the backup I'm figuring out the final backup bitmap, this
involves creating a temporary bitmap populated by the job for every
layer of the backing chain above of the one which contains the bitmap we
want to take a backup from and then merge all of them together as a base
for the backup.

(just thinking out loud)

So, assume the sequence top -> middle -> base

If we have a backup, which was done when we were in base, than bitmap is stored 
in base. And  is loaded, and is active, but don't changes really, as base is 
opened read-only.]
We merge block-status information of top and middle together with this bitmap, 
and aggregate difference between last backup and current state.

2) blockjobs

Note: This is currently an outline how the things should be as I've hit
the snag with attempting to run the population jobs during 'ready' state
of a active-layer block-commit/blockdev-mirror job only an hour ago and
I need to change a few things.

2.1) active layer block-commit/blockdev-mirror

When the job reaches 'ready' state I'll create bitmaps in the
destination/base image of the job for every bitmap in the images
dropped/merged (we use blockdev-mirror in full-sync mode) by the
blockjob. This will capture the writes that happen after 'job-complete'.

The job will then be completed and the 2.2. will be executed as well.

So, the aim is not to miss any new writes after switching to new bs, but do not 
capture into bitmaps writes which are copying the whole disk during mirror.

2.2) non-active commit and also continuation of active layer 

After the job is completed succesfully I'll create temporary
non-persistent bitmaps for/in the images removed by the blockjob and
merge them into the destination image's bitmaps depending on their
original location in the backing chain so that the bitmap state still
properly describes which blocks have changed.

I don't follow. How do you populate these new temporary bitmaps? They are empty 
after creation..

With the 'block-dirty-bitmap-populate' block job.

After that the original images willbe blockdev-del-eted. The above is
partialy in use today and since the job is already completed also
requires blockdev-reopen to successfuly write to the bitmaps.


While writing the above down I've actually realized that controling the
destination of the bitmap might not be as useful as I've thought
originally as in 2.2. step I might need the allocation bitmap merged
into multiple bitmaps, so I'd either need a temporary bitmap anyways or
would have to re-run the job multiple times which seems wasteful. I'm no
longer fully persuaded that adding the 'merge' step to the dirty
populate blockjob will be the best thing since sliced bread.

What is 'merge' step?

In some previous replies to Kevin, we discussed that it might be worth
optimizing 'block-dirty-bitmap-populate' to directly populate the bits
in the target bitmap rather than after the job is complete, so
efectively directly mering it. I probably described it wrong here.

Do you mean that populating directly into target bitmap is not really needed?

I need the bitmap populated by 'block-dirty-bitmap-populate' to be
merged into multiple bitmaps in the new semantics. If the job itself
doesn't support that sematics, changing it to just to directly populate
one bitmap doesn't seem to be worth it since I'll be using intermediate
bitmaps anyways.

Hmm, if the main use case of populating job is to merge changes since snapshot 
to several bitmaps (all active bitmaps?), than I think it's correct to 
implement exactly this semantics, allowing a list of targets as well as list of 
source bitmaps. We even can reuse same structure for target-list which we use 
for source-list. And it's simple to implement in Qemu.

Best regards,

Reply via email to