Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-09-12 Thread Kevin Wolf
Am 11.09.2019 um 13:00 hat Max Reitz geschrieben:
> On 11.09.19 12:31, Kevin Wolf wrote:
> > Am 11.09.2019 um 12:00 hat Max Reitz geschrieben:
> >> So all in all I think it’s best to make the callback mandatory and add
> >> two global helper functions.  That’s simple enough and should prevent
> >> us from making mistakes by forgetting to adjust something in the
> >> future.
> > 
> > Yes, that should work.
> > 
> > We should probably still figure out what the relationship between the
> > child access functions and child roles is, even if we don't need it for
> > this solution. But it feels like an important part of the design.
> 
> Hm.  It feels like something that should be done before this series,
> actually.
> 
> So I think we should add at least a child role per child access function
> so that they match?  And then maybe in bdrv_attach_child() assert that a
> BDS never has more than one primary or filtered child (a filtered child
> acts as a primary child, too), or more than one COW child.  (And that
> these are always in bs->file or bs->backing so the child access
> functions do work.)

Makes sense to me.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-09-11 Thread Max Reitz
On 11.09.19 12:31, Kevin Wolf wrote:
> Am 11.09.2019 um 12:00 hat Max Reitz geschrieben:
>> On 11.09.19 10:27, Kevin Wolf wrote:
>>> Am 11.09.2019 um 09:37 hat Max Reitz geschrieben:
 On 11.09.19 08:55, Kevin Wolf wrote:
> Well, by default the primary child, which should cover like 90% of the
> drivers?

 Hm, yes.

 But I still think that the drivers that do not want to count every
 single non-COW child are the exception.
>>>
>>> They are, but drivers that want to count more than their primary node
>>> are exceptions, too. And I think you're more likely to remember adding
>>> the callback when you want to have a certain feature, not when you don't
>>> want to have it.
>>>
>>> I really think we're likely to forget adding the callback where we need
>>> to disable the feature.
>>
>> Well, I mean, we did forget adding it for qcow2.
> 
> I'm afraid I have to agree. So the conclusion is that we won't get it
> right anyway?
> 
>>> I can see two options that should address both of our views:
>>>
>>> 1. Just don't have a fallback at all, make the callback mandatory and
>>>provide implementations in block.c that can be referred to in
>>>BlockDriver. Not specifying the callback causes an assertion failure,
>>>so we'd hopefully notice it quite early (assuming that we run either
>>>'qemu-img info' or 'query-block' on a configuration with the block
>>>driver, but I think that's faily safe to assume).
>>
>> Hm.  Seems a bit much, but if we can’t agree on what’s a good general
>> implementation that works for everything, this is probably the only
>> thing that would actually keep us from forgetting to add special cases.
>>
>> Though I actually don’t know.  I’d probably add two globally available
>> helpers, one that returns the sum of everything but the backing node,
>> and one that just returns the primary node.
> 
> Yes, I think this is the same as I meant by "provide implementations in
> block.c".
> 
>> Now if I were to make qcow2 use the primary node helper function, would
>> we have remembered changing it once we added a data file?
>>
>> Hmm.  Maybe not, but it should be OK to just make everything use the sum
>> helper, except the drivers that want the primary node.  That should work
>> for all cases.  (I think that whenever a format driver suddenly gains
>> more child nodes, we probably will want to count them.  OTOH, everything
>> that has nodes that shouldn’t be counted probably always wants to use
>> the primary node helper function from the start.)
> 
> The job filter nodes have only one child currently, which should be
> counted. We'll add other children that shouldn't be counted only later.
> 
> But we already have an idea of what possible extensions look like, so we
> can probably choose the right function from the start.

Yep.

>>> 2. Make the 90% solution a 100% solution: Allow drivers to have multiple
>>>storage children (for vmdk) and then have the fallback add up the
>>>primary child plus all storage children. This is what I suggested as
>>>the documented semantics in my initial reply to this patch (that you
>>>chose not to answer).
>>
>> I didn’t answer that because I didn’t disagree.
>>
>>>Adding the size of storage children covers qcow2 and vmdk.
>>
>> That’s of course exactly what we’re trying to do, but the question is,
>> how do we figure out that storage children?  Make it a per-BdrvChild
>> attribute?  That seems rather heavy-handed, because I think we’d need it
>> only here.
> 
> Well, you added bdrv_storage_child().I'd argue this interface is wrong

Yes, it probably is.

> because it assumes that only one storage child exists. You just didn't
> implement it for vmdk so that the problem didn't become apparent. It
> would have to return a list rather than a single child. So fixing the
> interface and then using it is what I was thinking.
> 
> Now that you mention a per-BdrvChild attribute, however, I start to
> wonder if the distinction between COW children, filter children, storage
> children, metadata children, etc. isn't really what BdrvChildRole was
> supposed to represent?

That’s a good point.

> Maybe we want to split off child_storage from child_file, though it's
> not strictly necessary for this specific case because we want to treat
> both metadata and storage nodes the same. But it could be useful for
> other users of bdrv_storage_child(), if there are any.

Possible.  Maybe it turns out that at least for this series I don’t need
bdrv_storage_child() at all.

>>>As the job filter won't declare the target or any other involved
>>>nodes their storage nodes (I hope), this will do the right thing for
>>>them, too.
>>>
>>>For quorum and blkverify both ways could be justifiable. I think they
>>>probably shouldn't declare their children as storage nodes. They are
>>>more like filters that don't have a single filtered node. So some
>>>kind of almost-filters.
>>
>> I don’t think quorum is a 

Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-09-11 Thread Kevin Wolf
Am 11.09.2019 um 12:00 hat Max Reitz geschrieben:
> On 11.09.19 10:27, Kevin Wolf wrote:
> > Am 11.09.2019 um 09:37 hat Max Reitz geschrieben:
> >> On 11.09.19 08:55, Kevin Wolf wrote:
> >>> Well, by default the primary child, which should cover like 90% of the
> >>> drivers?
> >>
> >> Hm, yes.
> >>
> >> But I still think that the drivers that do not want to count every
> >> single non-COW child are the exception.
> > 
> > They are, but drivers that want to count more than their primary node
> > are exceptions, too. And I think you're more likely to remember adding
> > the callback when you want to have a certain feature, not when you don't
> > want to have it.
> > 
> > I really think we're likely to forget adding the callback where we need
> > to disable the feature.
> 
> Well, I mean, we did forget adding it for qcow2.

I'm afraid I have to agree. So the conclusion is that we won't get it
right anyway?

> > I can see two options that should address both of our views:
> > 
> > 1. Just don't have a fallback at all, make the callback mandatory and
> >provide implementations in block.c that can be referred to in
> >BlockDriver. Not specifying the callback causes an assertion failure,
> >so we'd hopefully notice it quite early (assuming that we run either
> >'qemu-img info' or 'query-block' on a configuration with the block
> >driver, but I think that's faily safe to assume).
> 
> Hm.  Seems a bit much, but if we can’t agree on what’s a good general
> implementation that works for everything, this is probably the only
> thing that would actually keep us from forgetting to add special cases.
> 
> Though I actually don’t know.  I’d probably add two globally available
> helpers, one that returns the sum of everything but the backing node,
> and one that just returns the primary node.

Yes, I think this is the same as I meant by "provide implementations in
block.c".

> Now if I were to make qcow2 use the primary node helper function, would
> we have remembered changing it once we added a data file?
> 
> Hmm.  Maybe not, but it should be OK to just make everything use the sum
> helper, except the drivers that want the primary node.  That should work
> for all cases.  (I think that whenever a format driver suddenly gains
> more child nodes, we probably will want to count them.  OTOH, everything
> that has nodes that shouldn’t be counted probably always wants to use
> the primary node helper function from the start.)

The job filter nodes have only one child currently, which should be
counted. We'll add other children that shouldn't be counted only later.

But we already have an idea of what possible extensions look like, so we
can probably choose the right function from the start.

> > 2. Make the 90% solution a 100% solution: Allow drivers to have multiple
> >storage children (for vmdk) and then have the fallback add up the
> >primary child plus all storage children. This is what I suggested as
> >the documented semantics in my initial reply to this patch (that you
> >chose not to answer).
> 
> I didn’t answer that because I didn’t disagree.
> 
> >Adding the size of storage children covers qcow2 and vmdk.
> 
> That’s of course exactly what we’re trying to do, but the question is,
> how do we figure out that storage children?  Make it a per-BdrvChild
> attribute?  That seems rather heavy-handed, because I think we’d need it
> only here.

Well, you added bdrv_storage_child(). I'd argue this interface is wrong
because it assumes that only one storage child exists. You just didn't
implement it for vmdk so that the problem didn't become apparent. It
would have to return a list rather than a single child. So fixing the
interface and then using it is what I was thinking.

Now that you mention a per-BdrvChild attribute, however, I start to
wonder if the distinction between COW children, filter children, storage
children, metadata children, etc. isn't really what BdrvChildRole was
supposed to represent?

Maybe we want to split off child_storage from child_file, though it's
not strictly necessary for this specific case because we want to treat
both metadata and storage nodes the same. But it could be useful for
other users of bdrv_storage_child(), if there are any.

> >As the job filter won't declare the target or any other involved
> >nodes their storage nodes (I hope), this will do the right thing for
> >them, too.
> > 
> >For quorum and blkverify both ways could be justifiable. I think they
> >probably shouldn't declare their children as storage nodes. They are
> >more like filters that don't have a single filtered node. So some
> >kind of almost-filters.
> 
> I don’t think quorum is a filter, and blkverify can only be justified to
> be a filter because it quits qemu when there is a mismatch.
> 
> The better example is replication, but that has a clear filtered child
> (the primary node).
> 
> 
> So all in all I think it’s best to make the callback 

Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-09-11 Thread Max Reitz
On 11.09.19 10:27, Kevin Wolf wrote:
> Am 11.09.2019 um 09:37 hat Max Reitz geschrieben:
>> On 11.09.19 08:55, Kevin Wolf wrote:
>>> Am 11.09.2019 um 08:20 hat Max Reitz geschrieben:
 On 10.09.19 16:52, Kevin Wolf wrote:
> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
>> If the driver does not implement bdrv_get_allocated_file_size(), we
>> should fall back to cumulating the allocated size of all non-COW
>> children instead of just bs->file.
>>
>> Suggested-by: Vladimir Sementsov-Ogievskiy 
>> Signed-off-by: Max Reitz 
>
> This smells like an overgeneralisation, but if we want to count all vmdk
> extents, the qcow2 external data file, etc. it's an improvement anyway.
> A driver that has a child that should not be counted must just remember
> to implement the callback.
>
> Let me think of an example... How about quorum, for a change? :-)
> Or the second blkverify child.
>
> Or eventually the block job filter nodes.

 I actually think it makes sense for all of these nodes to report the sum
 of all of their children’s allocated sizes.
>>>
>>> Hm... Yes, in a way. But not much more than it would make sense to
>>> report the sum of the sizes of all images in the whole backing chain
>>> (this is a useful thing to ask for, just maybe not the right thing to
>>> return for a low-level interface). But I can accept that it's maybe a
>>> bit more expected for quorum and blkverify than for COW images.
>>>
>>> If you include the block job filter nodes, I have to disagree, though.
>>> If mirror_top_bs (or any other job filter) sits in the middle of the
>>> source chain, then I certainly don't want to see the target size added
>>> to it.
>>
>> Hm, I don’t care much either way.  I think it makes complete sense to
>> add the target size there, but OTOH it’s only temporary while the job
>> runs, so it may be a bit confusing if it suddenly goes up and then down
>> again.
> 
> I think the number that most users are interested in is knowing how much
> space the image for their /dev/vda takes up on the host.
> 
> I can see how they might be interested in not only that one image file,
> but all other image files connected to it, i.e. their /dev/vda with all
> of its snapshots. This would mean counting backing files. I think adding
> up the numbers for this should be done in the management layer.

My main argument against counting backing files is that we’ve never done it.

(Whereas for quorum, I’d argue we just forgot to adjust
bdrv_get_allocated_file_size() for it.)

> I can possibly also imagine users wanting to count everything that's
> even loosely connected to their /dev/vda, like copies of it. I doubt,
> however, they want to count only copies that are currently being made,
> but not snapshots and copies that have been completed earlier. So this
> is clearly a management layer thing, too.

OK.

>> But I think this is the special case, so this is what should be handled
>> in a driver callback.
> 
> It's a special case, yes. But see below.
> 
 If a quorum node has three children with allocated sizes of 3 MB, 1 MB,
 and 2 MB, respectively (totally possible if some have explicit zeroes
 and others don’t; it may also depend on the protocol, the filesystem,
 etc.), then I think it makes most sense to report indeed 6 MB for the
 quorum subtree as a whole.  What would you report?  3 MB?
>>>
>>> Do it's the quorum way: Just vote!
>>
>> Add an option for it?  Average, maximum, median, majority, sum? :-)
> 
> We could also introduce a mode with an Electoral College so that
> sometimes an image that missed the majority has a chance to win anyway.

That’s actually a good idea for a quorum mode in general.  Who says the
majority is right?  Better let someone with more authority cross-check
the result.

>>> No, you're right, of course. -ENOTSUP is probably the only other thing
>>> you could do then.
>>>
> Ehm... Maybe I should just take back what I said first. It almost feels
> like it would be better if qcow2 and vmdk explicitly used a handler that
> counts all children (could still be a generic one in block.c) rather
> than having to remember to disable the functionality everywhere where we
> don't want to have it.

 I don’t, because everywhere we don’t want this functionality, we still
 need to choose a child.  This has to be done by the driver anyway.
>>>
>>> Well, by default the primary child, which should cover like 90% of the
>>> drivers?
>>
>> Hm, yes.
>>
>> But I still think that the drivers that do not want to count every
>> single non-COW child are the exception.
> 
> They are, but drivers that want to count more than their primary node
> are exceptions, too. And I think you're more likely to remember adding
> the callback when you want to have a certain feature, not when you don't
> want to have it.
> 
> I really think we're likely to forget adding the callback where we need
> to disable 

Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-09-11 Thread Kevin Wolf
Am 11.09.2019 um 09:37 hat Max Reitz geschrieben:
> On 11.09.19 08:55, Kevin Wolf wrote:
> > Am 11.09.2019 um 08:20 hat Max Reitz geschrieben:
> >> On 10.09.19 16:52, Kevin Wolf wrote:
> >>> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
>  If the driver does not implement bdrv_get_allocated_file_size(), we
>  should fall back to cumulating the allocated size of all non-COW
>  children instead of just bs->file.
> 
>  Suggested-by: Vladimir Sementsov-Ogievskiy 
>  Signed-off-by: Max Reitz 
> >>>
> >>> This smells like an overgeneralisation, but if we want to count all vmdk
> >>> extents, the qcow2 external data file, etc. it's an improvement anyway.
> >>> A driver that has a child that should not be counted must just remember
> >>> to implement the callback.
> >>>
> >>> Let me think of an example... How about quorum, for a change? :-)
> >>> Or the second blkverify child.
> >>>
> >>> Or eventually the block job filter nodes.
> >>
> >> I actually think it makes sense for all of these nodes to report the sum
> >> of all of their children’s allocated sizes.
> > 
> > Hm... Yes, in a way. But not much more than it would make sense to
> > report the sum of the sizes of all images in the whole backing chain
> > (this is a useful thing to ask for, just maybe not the right thing to
> > return for a low-level interface). But I can accept that it's maybe a
> > bit more expected for quorum and blkverify than for COW images.
> > 
> > If you include the block job filter nodes, I have to disagree, though.
> > If mirror_top_bs (or any other job filter) sits in the middle of the
> > source chain, then I certainly don't want to see the target size added
> > to it.
> 
> Hm, I don’t care much either way.  I think it makes complete sense to
> add the target size there, but OTOH it’s only temporary while the job
> runs, so it may be a bit confusing if it suddenly goes up and then down
> again.

I think the number that most users are interested in is knowing how much
space the image for their /dev/vda takes up on the host.

I can see how they might be interested in not only that one image file,
but all other image files connected to it, i.e. their /dev/vda with all
of its snapshots. This would mean counting backing files. I think adding
up the numbers for this should be done in the management layer.

I can possibly also imagine users wanting to count everything that's
even loosely connected to their /dev/vda, like copies of it. I doubt,
however, they want to count only copies that are currently being made,
but not snapshots and copies that have been completed earlier. So this
is clearly a management layer thing, too.

> But I think this is the special case, so this is what should be handled
> in a driver callback.

It's a special case, yes. But see below.

> >> If a quorum node has three children with allocated sizes of 3 MB, 1 MB,
> >> and 2 MB, respectively (totally possible if some have explicit zeroes
> >> and others don’t; it may also depend on the protocol, the filesystem,
> >> etc.), then I think it makes most sense to report indeed 6 MB for the
> >> quorum subtree as a whole.  What would you report?  3 MB?
> > 
> > Do it's the quorum way: Just vote!
> 
> Add an option for it?  Average, maximum, median, majority, sum? :-)

We could also introduce a mode with an Electoral College so that
sometimes an image that missed the majority has a chance to win anyway.

> > No, you're right, of course. -ENOTSUP is probably the only other thing
> > you could do then.
> > 
> >>> Ehm... Maybe I should just take back what I said first. It almost feels
> >>> like it would be better if qcow2 and vmdk explicitly used a handler that
> >>> counts all children (could still be a generic one in block.c) rather
> >>> than having to remember to disable the functionality everywhere where we
> >>> don't want to have it.
> >>
> >> I don’t, because everywhere we don’t want this functionality, we still
> >> need to choose a child.  This has to be done by the driver anyway.
> > 
> > Well, by default the primary child, which should cover like 90% of the
> > drivers?
> 
> Hm, yes.
> 
> But I still think that the drivers that do not want to count every
> single non-COW child are the exception.

They are, but drivers that want to count more than their primary node
are exceptions, too. And I think you're more likely to remember adding
the callback when you want to have a certain feature, not when you don't
want to have it.

I really think we're likely to forget adding the callback where we need
to disable the feature.

I can see two options that should address both of our views:

1. Just don't have a fallback at all, make the callback mandatory and
   provide implementations in block.c that can be referred to in
   BlockDriver. Not specifying the callback causes an assertion failure,
   so we'd hopefully notice it quite early (assuming that we run either
   'qemu-img info' or 'query-block' on a configuration with the block
   

Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-09-11 Thread Max Reitz
On 11.09.19 08:55, Kevin Wolf wrote:
> Am 11.09.2019 um 08:20 hat Max Reitz geschrieben:
>> On 10.09.19 16:52, Kevin Wolf wrote:
>>> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
 If the driver does not implement bdrv_get_allocated_file_size(), we
 should fall back to cumulating the allocated size of all non-COW
 children instead of just bs->file.

 Suggested-by: Vladimir Sementsov-Ogievskiy 
 Signed-off-by: Max Reitz 
>>>
>>> This smells like an overgeneralisation, but if we want to count all vmdk
>>> extents, the qcow2 external data file, etc. it's an improvement anyway.
>>> A driver that has a child that should not be counted must just remember
>>> to implement the callback.
>>>
>>> Let me think of an example... How about quorum, for a change? :-)
>>> Or the second blkverify child.
>>>
>>> Or eventually the block job filter nodes.
>>
>> I actually think it makes sense for all of these nodes to report the sum
>> of all of their children’s allocated sizes.
> 
> Hm... Yes, in a way. But not much more than it would make sense to
> report the sum of the sizes of all images in the whole backing chain
> (this is a useful thing to ask for, just maybe not the right thing to
> return for a low-level interface). But I can accept that it's maybe a
> bit more expected for quorum and blkverify than for COW images.
> 
> If you include the block job filter nodes, I have to disagree, though.
> If mirror_top_bs (or any other job filter) sits in the middle of the
> source chain, then I certainly don't want to see the target size added
> to it.

Hm, I don’t care much either way.  I think it makes complete sense to
add the target size there, but OTOH it’s only temporary while the job
runs, so it may be a bit confusing if it suddenly goes up and then down
again.

But I think this is the special case, so this is what should be handled
in a driver callback.

>> If a quorum node has three children with allocated sizes of 3 MB, 1 MB,
>> and 2 MB, respectively (totally possible if some have explicit zeroes
>> and others don’t; it may also depend on the protocol, the filesystem,
>> etc.), then I think it makes most sense to report indeed 6 MB for the
>> quorum subtree as a whole.  What would you report?  3 MB?
> 
> Do it's the quorum way: Just vote!

Add an option for it?  Average, maximum, median, majority, sum? :-)

> No, you're right, of course. -ENOTSUP is probably the only other thing
> you could do then.
> 
>>> Ehm... Maybe I should just take back what I said first. It almost feels
>>> like it would be better if qcow2 and vmdk explicitly used a handler that
>>> counts all children (could still be a generic one in block.c) rather
>>> than having to remember to disable the functionality everywhere where we
>>> don't want to have it.
>>
>> I don’t, because everywhere we don’t want this functionality, we still
>> need to choose a child.  This has to be done by the driver anyway.
> 
> Well, by default the primary child, which should cover like 90% of the
> drivers?

Hm, yes.

But I still think that the drivers that do not want to count every
single non-COW child are the exception.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-09-11 Thread Kevin Wolf
Am 11.09.2019 um 08:20 hat Max Reitz geschrieben:
> On 10.09.19 16:52, Kevin Wolf wrote:
> > Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
> >> If the driver does not implement bdrv_get_allocated_file_size(), we
> >> should fall back to cumulating the allocated size of all non-COW
> >> children instead of just bs->file.
> >>
> >> Suggested-by: Vladimir Sementsov-Ogievskiy 
> >> Signed-off-by: Max Reitz 
> > 
> > This smells like an overgeneralisation, but if we want to count all vmdk
> > extents, the qcow2 external data file, etc. it's an improvement anyway.
> > A driver that has a child that should not be counted must just remember
> > to implement the callback.
> > 
> > Let me think of an example... How about quorum, for a change? :-)
> > Or the second blkverify child.
> > 
> > Or eventually the block job filter nodes.
> 
> I actually think it makes sense for all of these nodes to report the sum
> of all of their children’s allocated sizes.

Hm... Yes, in a way. But not much more than it would make sense to
report the sum of the sizes of all images in the whole backing chain
(this is a useful thing to ask for, just maybe not the right thing to
return for a low-level interface). But I can accept that it's maybe a
bit more expected for quorum and blkverify than for COW images.

If you include the block job filter nodes, I have to disagree, though.
If mirror_top_bs (or any other job filter) sits in the middle of the
source chain, then I certainly don't want to see the target size added
to it.

> If a quorum node has three children with allocated sizes of 3 MB, 1 MB,
> and 2 MB, respectively (totally possible if some have explicit zeroes
> and others don’t; it may also depend on the protocol, the filesystem,
> etc.), then I think it makes most sense to report indeed 6 MB for the
> quorum subtree as a whole.  What would you report?  3 MB?

Do it's the quorum way: Just vote!

No, you're right, of course. -ENOTSUP is probably the only other thing
you could do then.

> > Ehm... Maybe I should just take back what I said first. It almost feels
> > like it would be better if qcow2 and vmdk explicitly used a handler that
> > counts all children (could still be a generic one in block.c) rather
> > than having to remember to disable the functionality everywhere where we
> > don't want to have it.
> 
> I don’t, because everywhere we don’t want this functionality, we still
> need to choose a child.  This has to be done by the driver anyway.

Well, by default the primary child, which should cover like 90% of the
drivers?

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-09-11 Thread Max Reitz
On 10.09.19 16:52, Kevin Wolf wrote:
> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
>> If the driver does not implement bdrv_get_allocated_file_size(), we
>> should fall back to cumulating the allocated size of all non-COW
>> children instead of just bs->file.
>>
>> Suggested-by: Vladimir Sementsov-Ogievskiy 
>> Signed-off-by: Max Reitz 
> 
> This smells like an overgeneralisation, but if we want to count all vmdk
> extents, the qcow2 external data file, etc. it's an improvement anyway.
> A driver that has a child that should not be counted must just remember
> to implement the callback.
> 
> Let me think of an example... How about quorum, for a change? :-)
> Or the second blkverify child.
> 
> Or eventually the block job filter nodes.

I actually think it makes sense for all of these nodes to report the sum
of all of their children’s allocated sizes.

If a quorum node has three children with allocated sizes of 3 MB, 1 MB,
and 2 MB, respectively (totally possible if some have explicit zeroes
and others don’t; it may also depend on the protocol, the filesystem,
etc.), then I think it makes most sense to report indeed 6 MB for the
quorum subtree as a whole.  What would you report?  3 MB?

> Ehm... Maybe I should just take back what I said first. It almost feels
> like it would be better if qcow2 and vmdk explicitly used a handler that
> counts all children (could still be a generic one in block.c) rather
> than having to remember to disable the functionality everywhere where we
> don't want to have it.

I don’t, because everywhere we don’t want this functionality, we still
need to choose a child.  This has to be done by the driver anyway.

Max

> And please adjust the comment for bdrv_get_allocated_file_size(), it
> only talks about a single file as if trees didn't exist. Actually, it
> doesn't even seem so easy to define. Maybe primary node + storage nodes?
> Then vmdk needs to expose its extents as storage nodes (plural!), but
> in the long run that might be needed anyway.



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-09-10 Thread Kevin Wolf
Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
> If the driver does not implement bdrv_get_allocated_file_size(), we
> should fall back to cumulating the allocated size of all non-COW
> children instead of just bs->file.
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Max Reitz 

This smells like an overgeneralisation, but if we want to count all vmdk
extents, the qcow2 external data file, etc. it's an improvement anyway.
A driver that has a child that should not be counted must just remember
to implement the callback.

Let me think of an example... How about quorum, for a change? :-)
Or the second blkverify child.

Or eventually the block job filter nodes.

Ehm... Maybe I should just take back what I said first. It almost feels
like it would be better if qcow2 and vmdk explicitly used a handler that
counts all children (could still be a generic one in block.c) rather
than having to remember to disable the functionality everywhere where we
don't want to have it.

And please adjust the comment for bdrv_get_allocated_file_size(), it
only talks about a single file as if trees didn't exist. Actually, it
doesn't even seem so easy to define. Maybe primary node + storage nodes?
Then vmdk needs to expose its extents as storage nodes (plural!), but
in the long run that might be needed anyway.

Kevin



Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-08-12 Thread Max Reitz
On 12.08.19 19:14, Vladimir Sementsov-Ogievskiy wrote:
> 12.08.2019 16:09, Max Reitz wrote:
>> On 10.08.19 18:41, Vladimir Sementsov-Ogievskiy wrote:
>>> 09.08.2019 19:13, Max Reitz wrote:
 If the driver does not implement bdrv_get_allocated_file_size(), we
 should fall back to cumulating the allocated size of all non-COW
 children instead of just bs->file.

 Suggested-by: Vladimir Sementsov-Ogievskiy 
 Signed-off-by: Max Reitz 
 ---
block.c | 22 --
1 file changed, 20 insertions(+), 2 deletions(-)

 diff --git a/block.c b/block.c
 index 1070aa1ba9..6e1ddab056 100644
 --- a/block.c
 +++ b/block.c
 @@ -4650,9 +4650,27 @@ int64_t 
 bdrv_get_allocated_file_size(BlockDriverState *bs)
if (drv->bdrv_get_allocated_file_size) {
return drv->bdrv_get_allocated_file_size(bs);
}
 -if (bs->file) {
 -return bdrv_get_allocated_file_size(bs->file->bs);
 +
 +if (!QLIST_EMPTY(>children)) {
 +BdrvChild *child;
 +int64_t child_size, total_size = 0;
 +
 +QLIST_FOREACH(child, >children, next) {
 +if (child == bdrv_filtered_cow_child(bs)) {
 +/* Ignore COW backing files */
 +continue;
 +}
 +
 +child_size = bdrv_get_allocated_file_size(child->bs);
 +if (child_size < 0) {
 +return child_size;
 +}
 +total_size += child_size;
 +}
 +
 +return total_size;
}
 +
return -ENOTSUP;
}


>>>
>>> Hmm..
>>>
>>> 1. No children -> -ENOTSUP
>>> 2. Only cow child -> 0
>>> 3. Some non-cow children -> SUM
>>>
>>> It's all arguable (the strictest way is -ENOTSUP in either case),
>>> but if we want to fallback to SUM of non-cow children, 1. and 2. should 
>>> return
>>> the same.
>>
>> I don’t think 2 is possible at all.  If you have a COW child, you need
>> some other child to COW to.
>>
>> And in the weird (and probably impossible) case where a node really only
>> has a COW child, I’d say it’s correct that it has a disk size of 0 –
>> because it hasn’t COWed anything yet.  (Just like a new qcow2 image with
>> a backing file only has its metadata as its disk size.)
>>
> 
> Agreed. Then, why not return 0 on [1] ?

(1) Because that’s the current behavior. :-)

(2) Nodes that have no children are protocol nodes.  Protocol nodes
(apart from null) still have to store their data somewhere.  Therefore,
they must implement .bdrv_get_allocated_file_size() to report that.  If
they don’t, that doesn’t mean they don’t store any data, but only that
we don’t know how much data they store.

> Also, another idea: shouldn't we return 0 for filters, i.e. skip 
> filtered_rw_child too?
> [as filtered-child is more like backing child than file one, it's "less 
> owned" by its parent]

Why would we do that?  If I have a block device with a throttle node
attached to it and request how much space it uses, of course I will want
to know how much space the whole tree below it uses.

(Otherwise, bdrv_get_allocated_file_size() should only report anything
for protocol nodes, and 0 for everything else.)

Max

> with or without any of these suggestions:
> Reviewed-by: Vladimir Sementsov-Ogievskiy 





signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-08-12 Thread Vladimir Sementsov-Ogievskiy
12.08.2019 16:09, Max Reitz wrote:
> On 10.08.19 18:41, Vladimir Sementsov-Ogievskiy wrote:
>> 09.08.2019 19:13, Max Reitz wrote:
>>> If the driver does not implement bdrv_get_allocated_file_size(), we
>>> should fall back to cumulating the allocated size of all non-COW
>>> children instead of just bs->file.
>>>
>>> Suggested-by: Vladimir Sementsov-Ogievskiy 
>>> Signed-off-by: Max Reitz 
>>> ---
>>>block.c | 22 --
>>>1 file changed, 20 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index 1070aa1ba9..6e1ddab056 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -4650,9 +4650,27 @@ int64_t 
>>> bdrv_get_allocated_file_size(BlockDriverState *bs)
>>>if (drv->bdrv_get_allocated_file_size) {
>>>return drv->bdrv_get_allocated_file_size(bs);
>>>}
>>> -if (bs->file) {
>>> -return bdrv_get_allocated_file_size(bs->file->bs);
>>> +
>>> +if (!QLIST_EMPTY(>children)) {
>>> +BdrvChild *child;
>>> +int64_t child_size, total_size = 0;
>>> +
>>> +QLIST_FOREACH(child, >children, next) {
>>> +if (child == bdrv_filtered_cow_child(bs)) {
>>> +/* Ignore COW backing files */
>>> +continue;
>>> +}
>>> +
>>> +child_size = bdrv_get_allocated_file_size(child->bs);
>>> +if (child_size < 0) {
>>> +return child_size;
>>> +}
>>> +total_size += child_size;
>>> +}
>>> +
>>> +return total_size;
>>>}
>>> +
>>>return -ENOTSUP;
>>>}
>>>
>>>
>>
>> Hmm..
>>
>> 1. No children -> -ENOTSUP
>> 2. Only cow child -> 0
>> 3. Some non-cow children -> SUM
>>
>> It's all arguable (the strictest way is -ENOTSUP in either case),
>> but if we want to fallback to SUM of non-cow children, 1. and 2. should 
>> return
>> the same.
> 
> I don’t think 2 is possible at all.  If you have a COW child, you need
> some other child to COW to.
> 
> And in the weird (and probably impossible) case where a node really only
> has a COW child, I’d say it’s correct that it has a disk size of 0 –
> because it hasn’t COWed anything yet.  (Just like a new qcow2 image with
> a backing file only has its metadata as its disk size.)
> 

Agreed. Then, why not return 0 on [1] ?

Also, another idea: shouldn't we return 0 for filters, i.e. skip 
filtered_rw_child too?
[as filtered-child is more like backing child than file one, it's "less owned" 
by its parent]

with or without any of these suggestions:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-08-12 Thread Max Reitz
On 10.08.19 18:41, Vladimir Sementsov-Ogievskiy wrote:
> 09.08.2019 19:13, Max Reitz wrote:
>> If the driver does not implement bdrv_get_allocated_file_size(), we
>> should fall back to cumulating the allocated size of all non-COW
>> children instead of just bs->file.
>>
>> Suggested-by: Vladimir Sementsov-Ogievskiy 
>> Signed-off-by: Max Reitz 
>> ---
>>   block.c | 22 --
>>   1 file changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 1070aa1ba9..6e1ddab056 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -4650,9 +4650,27 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState 
>> *bs)
>>   if (drv->bdrv_get_allocated_file_size) {
>>   return drv->bdrv_get_allocated_file_size(bs);
>>   }
>> -if (bs->file) {
>> -return bdrv_get_allocated_file_size(bs->file->bs);
>> +
>> +if (!QLIST_EMPTY(>children)) {
>> +BdrvChild *child;
>> +int64_t child_size, total_size = 0;
>> +
>> +QLIST_FOREACH(child, >children, next) {
>> +if (child == bdrv_filtered_cow_child(bs)) {
>> +/* Ignore COW backing files */
>> +continue;
>> +}
>> +
>> +child_size = bdrv_get_allocated_file_size(child->bs);
>> +if (child_size < 0) {
>> +return child_size;
>> +}
>> +total_size += child_size;
>> +}
>> +
>> +return total_size;
>>   }
>> +
>>   return -ENOTSUP;
>>   }
>>   
>>
> 
> Hmm..
> 
> 1. No children -> -ENOTSUP
> 2. Only cow child -> 0
> 3. Some non-cow children -> SUM
> 
> It's all arguable (the strictest way is -ENOTSUP in either case),
> but if we want to fallback to SUM of non-cow children, 1. and 2. should return
> the same.

I don’t think 2 is possible at all.  If you have a COW child, you need
some other child to COW to.

And in the weird (and probably impossible) case where a node really only
has a COW child, I’d say it’s correct that it has a disk size of 0 –
because it hasn’t COWed anything yet.  (Just like a new qcow2 image with
a backing file only has its metadata as its disk size.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 19:13, Max Reitz wrote:
> If the driver does not implement bdrv_get_allocated_file_size(), we
> should fall back to cumulating the allocated size of all non-COW
> children instead of just bs->file.
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Max Reitz 
> ---
>   block.c | 22 --
>   1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 1070aa1ba9..6e1ddab056 100644
> --- a/block.c
> +++ b/block.c
> @@ -4650,9 +4650,27 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState 
> *bs)
>   if (drv->bdrv_get_allocated_file_size) {
>   return drv->bdrv_get_allocated_file_size(bs);
>   }
> -if (bs->file) {
> -return bdrv_get_allocated_file_size(bs->file->bs);
> +
> +if (!QLIST_EMPTY(>children)) {
> +BdrvChild *child;
> +int64_t child_size, total_size = 0;
> +
> +QLIST_FOREACH(child, >children, next) {
> +if (child == bdrv_filtered_cow_child(bs)) {
> +/* Ignore COW backing files */
> +continue;
> +}
> +
> +child_size = bdrv_get_allocated_file_size(child->bs);
> +if (child_size < 0) {
> +return child_size;
> +}
> +total_size += child_size;
> +}
> +
> +return total_size;
>   }
> +
>   return -ENOTSUP;
>   }
>   
> 

Hmm..

1. No children -> -ENOTSUP
2. Only cow child -> 0
3. Some non-cow children -> SUM

It's all arguable (the strictest way is -ENOTSUP in either case),
but if we want to fallback to SUM of non-cow children, 1. and 2. should return
the same.

-- 
Best regards,
Vladimir