On 17.06.19 17:53, Kevin Wolf wrote: > Am 17.06.2019 um 16:56 hat Max Reitz geschrieben: >> On 17.06.19 12:36, Vladimir Sementsov-Ogievskiy wrote: >>> 14.06.2019 23:03, Max Reitz wrote: >>>> On 14.06.19 18:22, Vladimir Sementsov-Ogievskiy wrote: >>>>> 14.06.2019 15:57, Max Reitz wrote: >>>>>> On 14.06.19 11:04, Vladimir Sementsov-Ogievskiy wrote: >>>>>>> 13.06.2019 18:57, Max Reitz wrote: >>>>>>>> On 29.05.19 17:46, Vladimir Sementsov-Ogievskiy wrote: >>>>>>>>> Backup-top filter does copy-before-write operation. It should be >>>>>>>>> inserted above active disk and has a target node for CBW, like the >>>>>>>>> following: >>>>>>>>> >>>>>>>>> +-------+ >>>>>>>>> | Guest | >>>>>>>>> +-------+ >>>>>>>>> |r,w >>>>>>>>> v >>>>>>>>> +------------+ target +---------------+ >>>>>>>>> | backup_top |---------->| target(qcow2) | >>>>>>>>> +------------+ CBW +---------------+ >>>>>>>>> | >>>>>>>>> backing |r,w >>>>>>>>> v >>>>>>>>> +-------------+ >>>>>>>>> | Active disk | >>>>>>>>> +-------------+ >>>>>>>>> >>>>>>>>> The driver will be used in backup instead of write-notifiers. >>>>>>>>> >>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]> >>>>>>>>> --- >>>>>>>>> block/backup-top.h | 64 +++++++++ >>>>>>>>> block/backup-top.c | 322 >>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++ >>>>>>>>> block/Makefile.objs | 2 + >>>>>>>>> 3 files changed, 388 insertions(+) >>>>>>>>> create mode 100644 block/backup-top.h >>>>>>>>> create mode 100644 block/backup-top.c >>>>>>>>> >>>>>>>>> diff --git a/block/backup-top.h b/block/backup-top.h >>>>>>>>> new file mode 100644 >>>>>>>>> index 0000000000..788e18c358 >>>>>>>>> --- /dev/null >>>>>>>>> +++ b/block/backup-top.h >>>>>> >>>>>> [...] >>>>>> >>>>>>>>> +/* >>>>>>>>> + * bdrv_backup_top_append >>>>>>>>> + * >>>>>>>>> + * Append backup_top filter node above @source node. @target node >>>>>>>>> will receive >>>>>>>>> + * the data backed up during CBE operations. New filter together >>>>>>>>> with @target >>>>>>>>> + * node are attached to @source aio context. >>>>>>>>> + * >>>>>>>>> + * The resulting filter node is implicit. >>>>>>>> >>>>>>>> Why? It’s just as easy for the caller to just make it implicit if it >>>>>>>> should be. (And usually the caller should decide that.) >>>>>>> >>>>>>> Personally, I don't know what are the reasons for filters to bi >>>>>>> implicit or not, >>>>>>> I just made it like other job-filters.. I can move making-implicit to >>>>>>> the caller >>>>>>> or drop it at all (if it will work). >>>>>> >>>>>> Nodes are implicit if they haven’t been added consciously by the user. >>>>>> A node added by a block job can be non-implicit, too, as mirror shows; >>>>>> If the user specifies the filter-node-name option, they will know about >>>>>> the node, thus it is no longer implicit. >>>>>> >>>>>> If the user doesn’t know about the node (they didn’t give the >>>>>> filter-node-name option), the node is implicit. >>>>>> >>>>> >>>>> Ok, I understand it. But it doesn't show, why it should be implicit? >>>>> Isn't it less buggy to make all filters explicit? We don't hide implicit >>>>> nodes >>>>> from query-named-block-nodes (the only interface to explore the whole >>>>> graph for now) >>>>> anyway. And we can't absolutely hide side effects of additional node in >>>>> the graph. >>>> >>>> Well, we try, at least. At least we hide them from query-block. >>>> >>>>> So, is there any real benefit of supporting separation into implicit and >>>>> explicit filters? >>>>> It seems for me that it only complicates things... >>>>> In other words, what will break if we make all filters explicit? >>>> >>>> The theory is that qemu may decide to add nodes at any point, but at >>>> least when managing chains etc., they may not be visible to the user. I >>>> don’t think we can get rid of them so easily. > > I think the important point to understand here is that implicit filters > are a concept to maintain compatibility with legacy (pre-blockdev) > management tools which simply don't manage stuff on the node level. So > changing the structure of the graph is not a problem for them and we > just need to make sure that when they do something with a BlockBackend, > we perform the action on the node that they actually mean. > > Modern management tools are expected to manage the graph on a node level > and to assign node names to everything. > > Note that libvirt is close to becoming a modern management tool, so it's > probably a good idea to now make all block jobs add filters where we'll > need them in the long run. > >>>> One example that isn’t implemented yet is copy-on-read. In theory, >>>> specifying copy-on-read=on for -drive should create an implicit COR node >>>> on top. The user shouldn’t see that node when inspecting the drive or >>>> when issuing block jobs on it, etc. And this node has to stay there >>>> when the user does e.g. an active commit somewhere down the chain. >>> >>> Why instead not to just write in doc that yes, filter is created when you >>> enable copy-on-read? >> >> Because the problem is that existing users are not aware of that. >> >> If they were, they could simply create a COR filter already. >> >> I suppose we could interpret the deprecation policy in a way that we >> could change the behavior of -drive copy-on-read=on, but as I already >> said, what’s the point of supporting -drive copy-on-read=on, when you >> can simply explicitly create a COR filter on top? > > I actually changed my opinion on how to do this since we introduced > implicit filters. I know think that the right way to move forward is to > make sure that the copy-on-read filter can do everything it needs to do, > and then just completely deprecate -drive copy-on-read=on instead of > adding compatibility magic to turn it into an implicit copy-on-read > filter internally.
Sure, so your answer to “what’s the point of supporting -drive copy-on-read=on” is “there is none”. Fair enough. :-) >>>> That sounds like a horrible ordeal to implement, so it hasn’t been done >>>> yet. Maybe it never will. It isn’t that bad for the job filters, >>>> because they generally freeze the block graph, so there is no problem >>>> with potential modifications. >>>> >>>> All in all I do think having implicit nodes makes sense. Maybe not so >>>> much now, but in the future (if someone implements converting -drive COR >>>> and throttle options to implicit nodes...). >>> >>> But do we have at least strong definition of how implicit nodes should >>> behave >>> on any graph-modifying operations around them? >> >> No. Depends on the node. >> >>> Should new implicit/explicit >>> filters be created above or under them? >> >> That was always the most difficult question we had when we introduced >> filters. >> >> The problem is that we never answered it in our code base. >> >> One day, we just added filters (“let’s see what happens”), and in the >> beginning, they were all explicit, so we still didn’t have to answer >> this question (in code). Then job filters were added, and we still >> didn’t have to, because they set blockers so the graph couldn’t be >> reorganized with them in place anyway. >> >> If we converted copy-on-read=on to a COR node, we would have to answer >> that question. > > That's why we shouldn't do that, but just remove copy-on-read=on. :-) And BB-level throttling, fair enough. (Although the first step would be probably to make throttle groups non-experimental? Like, drop the x- prefix for all their parameters.) >>>>> But should really filter do that job, or it is here only to do CBW? Maybe >>>>> target >>>>> must have another parent which will care about flushing. >>>>> >>>>> Ok, I think I'll flush target here too for safety, and leave a comment, >>>>> that something >>>>> smarter would be better. >>>>> (or, if we decide to flush all children by default in generic code, I'll >>>>> drop this handler) >>>> >>>> [1] Thinking more about this, for normal backups the target file does >>>> not reflect a valid disk state until the backup is done; just like for >>>> qemu-img convert. Just like qemu-img convert, there is therefore no >>>> reason to flush the target until the job is done. > > Depends, the target could have the source as its backing file (like > fleecing, but without exporting it right now), and then you could always > have a consistent view on the target. Well, almost. > > Almost because to guarantee this, we'd have to flush between each CBW > and the corresponding write to the same block, to make sure that the old > data is backed up before it is overwritten. Yes, that’s what I meant by “enforce on the host that the target is always flushed before the source”. Well, I meant to say there is no good way of doing that, and I kind of don’t consider this a good way. > Of course, this would perform about as badly as internal COW in qcow2... > So probably not a guarantee we should be making by default. But it might > make sense as an option. I don’t know. “Use this option so your data isn’t lost in case of a power failure, but it makes everything pretty slow”? Who is going to do explicitly enable that (before their first data loss)? Max
signature.asc
Description: OpenPGP digital signature
