On 06.02.20 17:57, Kevin Wolf wrote: > Am 06.02.2020 um 17:43 hat Max Reitz geschrieben: >> On 06.02.20 16:51, Kevin Wolf wrote: >>> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben: >>>> On 06.02.20 15:58, Kevin Wolf wrote: >>>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben: >>>>>> On 05.02.20 16:38, Kevin Wolf wrote: >>>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben: >>>>>>>> We will need this to verify that Quorum can let one of its children be >>>>>>>> replaced without breaking anything else. >>>>>>>> >>>>>>>> Signed-off-by: Max Reitz <mre...@redhat.com> >>>>>>>> --- >>>>>>>> block/quorum.c | 25 +++++++++++++++++++++++++ >>>>>>>> 1 file changed, 25 insertions(+) >>>>>>>> >>>>>>>> diff --git a/block/quorum.c b/block/quorum.c >>>>>>>> index 59cd524502..6a7224c9e4 100644 >>>>>>>> --- a/block/quorum.c >>>>>>>> +++ b/block/quorum.c >>>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes { >>>>>>>> >>>>>>>> typedef struct QuorumChild { >>>>>>>> BdrvChild *child; >>>>>>>> + >>>>>>>> + /* >>>>>>>> + * If set, check whether this node can be replaced without any >>>>>>>> + * other parent noticing: Unshare CONSISTENT_READ, and take the >>>>>>>> + * WRITE permission. >>>>>>>> + */ >>>>>>>> + bool to_be_replaced; >>>>>>> >>>>>>> I don't understand these permission changes. How does (preparing for) >>>>>>> detaching a node from quorum make its content invalid? >>>>>> >>>>>> It doesn’t, of course. What we are preparing for is to replace it by >>>>>> some other node with some other content. >>>>>> >>>>>>> And why do we >>>>>>> suddenly need WRITE permissions even if the quorum node is only used >>>>>>> read-only? >>>>>>> >>>>>>> The comment is a bit unclear, too. "check whether" implies that both >>>>>>> outcomes could be true, but it doesn't say what happens in either case. >>>>>>> Is this really "make sure that"? >>>>>> >>>>>> I think the comment is not only unclear, it is the problem. (Well, >>>>>> maybe the code is also.) >>>>>> >>>>>> This series is about fixing at least some things about replacing nodes >>>>>> by mirroring. The original use cases this was introduced for was to fix >>>>>> broken quorum children: The other children are still intact, so you read >>>>>> from the quorum node and replace the broken child (which maybe shows >>>>>> invalid data, or maybe just EIO) by the fixed mirror result. >>>>>> >>>>>> Replacing that broken node by the fixed one changes the data that’s >>>>>> visible on that node. >>>>> >>>>> Hm, yes, that's true. But I wonder if this is really something that the >>>>> permission system must catch. Like other graph manipulations, it's >>>>> essentially the user saying "trust me, I know what I'm doing, this node >>>>> makes sense in this place". >>>>> >>>>> Because if you assume that the user could add a node with unsuitable >>>>> content and you want to prevent this, where do we stop? >>>>> blockdev-snapshot can insert a non-empty overlay, which would result in >>>>> visible data change. Should we therefore only allow snapshots when >>>>> shared writes are allowed? This doesn't work obviously. >>>>> >>>>> So I'm inclined to say that this is the user's responsibility and we >>>>> don't have to jump through hoops to prevent every possible way that the >>>>> user could mess up. (Which often also result in preventing legitimate >>>>> cases like here a quorum of read-only nodes.) >>>> >>>> Well, if you ask the question “where do we stop”, we also have to ask >>>> the question “where do we start”. If we say the user knows what they’re >>>> doing, we might as well drop the whole can_replace infrastructure >>>> altogether and just assume that you can replace any node by anything. >>> >>> Well, I don't actually know if that would be completely unreasonable. >>> The idea was obviously to keep graph changes restricted to very specific >>> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we >>> have quite a few more operations that allow changing the graph. >>> >>> So if preventing some cases gives us headaches and is probably more work >>> than dealing with any bugs they might reveal, maybe preventing them is >>> wrong. >>> >>> I'm just afraid that we might be overengineering this and waste time on >>> things that we don't actually get much use from. >> >> That’s why I’m asking. > > Did I answer your question sufficiently then?
No, because “I’m afraid” is a sentiment I fully share, but it doesn’t answer the question whether we are indeed overengineering this or not. :-) I suppose my stance now is “This is probably overengineered, but now we might as well roll with it”. >>>> If the WRITE permission is the problem, then I suppose we can drop that. >>>> Unsharing CONSISTENT_READ is bad enough that it effectively deters all >>>> other parents anyway. >>> >>> WRITE is probably the more practical problem, though it's technically >>> the correct one to take. >>> >>> CONSISTENT_READ is already a problem in theory because replacing a child >>> node with different content doesn't even match its definition: >>> >>> /** >>> * A user that has the "permission" of consistent reads is guaranteed >>> that >>> * their view of the contents of the block device is complete and >>> * self-consistent, representing the contents of a disk at a specific >>> * point. >>> * >>> * For most block devices (including their backing files) this is true, >>> but >>> * the property cannot be maintained in a few situations like for >>> * intermediate nodes of a commit block job. >>> */ >>> BLK_PERM_CONSISTENT_READ = 0x01, >>> >>> Replacing an image with a different image means that the node represents >>> the content of a different disk now, but it's probably still complete >>> and self-consistent. >> >> At any point in time yes, but not over the time span of the change. The >> definition doesn’t say that the node represents the contents of a disk >> at a specific point, but the view from the parent. >> >> I argue that that view is always over some period of time, so if you >> suddenly switch out the whole disk, then it isn’t a self-consistent view. > > I think your theory that it's over some period of time conflicts with > the documentation that says "at a specific point". I’d rather not get into a deeper discussion on what CONSISTENT_READ means again... :-/ I always feel like if you really take only a single point in time, then anything could be some hypothetical disk. So to me, unsharing CONSISTENT_READ effectively just means “Don’t touch this, you don’t want to”. >> Alternatively, we could of course also just forego the permission system >> here altogether and just check that there are no other parents at all. >> (Which is effectively the same as unsharing CONSISTENT_READ.) > > This would sidestep all of the artificial permission twiddling, which > sounds good. > > It would probably also needlessly restrict the allowed use cases, Only in theory, though, because in practice basically everything useful takes CONSISTENT_READ anyway. > but > then, who cares about nodes with multiple parents, one of which is a > quorum node? > > So I guess I would be fine with either checking that there are no > parents or maybe even just dropping the check completely. OK, I’ll check the parent list then. (Except it must be exactly one parent, namely Quorum.) Max
signature.asc
Description: OpenPGP digital signature