Re: block stream and bitmaps

2020-03-25 Thread Kevin Wolf
Am 24.03.2020 um 20:19 hat John Snow geschrieben:
> 
> 
> On 3/24/20 6:18 AM, Kevin Wolf wrote:
> > Am 23.03.2020 um 19:06 hat John Snow geschrieben:
> >> Hi Kevin,
> >>
> >> I'm hoping to get some preliminary ideas from you (capped at five
> >> minutes' effort?) on this subject. My ideas here are a bit shaky, I only
> >> have really rough notions here.
> >>
> >> We want to use bitmaps with 'drive' semantics; i.e. tracking changes to
> >> the visible guest data. What we have are bitmaps with node semantics,
> >> tracking changes to the data at a particular level in the graph.
> >>
> >> For commit, this isn't a big deal: we can disable bitmaps in the backing
> >> file while we commit and then re-enable it on completion. We usually
> >> have a separate bitmap enabled on the root node that is recording writes
> >> during this time, and can be moved later.
> >>
> >> For streaming, this is more challenging: new writes will dirty the
> >> bitmap, but so will writes from the stream job itself.
> >>
> >> Semantically, we want to ignore writes from the stream while recording
> >> them from the guest -- differentiating based on source.
> > 
> > No, based on source is actually not what you want. What you really want
> > is that BDRV_REQ_WRITE_UNCHANGED doesn't mark any blocks dirty.
> > 
> 
> This is why I sent the mail, I figured you'd know the better incision
> point, and I was right!
> 
> > We discussed this specific case of streaming at FOSDEM (with Paolo and
> > probably Nir). Paolo was even convinced that unchanged writes already
> > behave like this, but we agreed that dirtying blocks for them would be a
> > bug. After checking that the code is indeed buggy, I was planning to
> > send a patch, but never got around to actually do that. Sorry about
> > that.
> 
> Glad to hear it has been given consideration, though!

Yes, if we hadn't talked about it, I probably wouldn't have had this
suggestion for you.

Anyway, someone still needs to write that patch. Should I try to find
time for it or are you (or someone else working on bitmaps) going to do
that?

> >> Bitmaps aren't really geared to do that right now. With the changes to
> >> Bdrv Roles that Max was engineering, do you think it's possible to add
> >> some kind of write source discrimination to bitmaps, or is that too messy?
> > 
> > I don't think it would work because copy-on-read requests come from the
> > same parent node as writes (no matter whether the legacy code in
> > block/io.c or a copy-on-read filter node is used).
> > 
> 
> Oh, understood. Rule that approach out, then.
> 
> >> For both commit and stream, it might be nice to say: "This bitmap is
> >> enabled, but ignores writes from [all? specific types? specific
> >> instances?] jobs.
> > 
> > Commit is a bit trickier, because it's not WRITE_UNCHANGED. The result
> > is only unchanged for the top layer, but not for the backing file you're
> > committing to. Not sure if we can represent this condition somehow.
> > 
> 
> Nothing comes to mind apart from a semantic that applies to a graph
> subsection instead of an individual node.
> 
> i.e. UNCHANGED as applied to [A --> B].
> 
> Not saying that's reasonable to develop... or necessarily even possible
> to enforce, just nothing else comes to mind.

And then the result for that request would be that it doesn't dirty any
bitmaps at A, but it does dirty the bitmaps at B?

I mean, many things can be developed, but being sure that they actually
implement the right thing is harder... So as long as we have the
workaround you mentioned above, maybe let's just ignore this for now.

> >> Or, I wonder if what we truly want is some kind of bitmap "forwarder"
> >> object on block-backend objects that represent the semantic drive view,
> >> and only writes through that *backend* get forwarded to the bitmaps
> >> attached to whatever node the bitmap is actually associated with.
> >>
> >> (That might wind up causing weird problems too, though... since those
> >> objects are no longer intended to be user-addressable, managing that
> >> configuration might get intensely strange.)
> > 
> > Hm... Drive-based does suggest that it's managed at the BlockBackend
> > level. So having a bitmap that isn't added as a dirty bitmap to the BDS,
> > but only to the BB does make sense to me. The BB would be addressed
> > with the qdev ID of the device, as usual (which underlines that it's
> > really per device).
> > 
> 
> That's the rough idea, though if it's needed or not is unclear. We might
> be able to get by with node semantics if we jazz them up enough...?

Ah. I just took it as a given that you need or want BlockBackend-based
bitmaps because that's how you started the email. Maybe I should look at
the bigger picture before suggesting things.

So what was the real motivation behind it?

> Working around all the edge bases of a drive-semantic bitmap seem
> difficult to reason about.
> 
> In general, it should likely be made persistent against the root-most
> node to which writ

Re: block stream and bitmaps

2020-03-24 Thread Vladimir Sementsov-Ogievskiy

24.03.2020 22:19, John Snow wrote:



On 3/24/20 6:18 AM, Kevin Wolf wrote:

Am 23.03.2020 um 19:06 hat John Snow geschrieben:

Hi Kevin,

I'm hoping to get some preliminary ideas from you (capped at five
minutes' effort?) on this subject. My ideas here are a bit shaky, I only
have really rough notions here.

We want to use bitmaps with 'drive' semantics; i.e. tracking changes to
the visible guest data. What we have are bitmaps with node semantics,
tracking changes to the data at a particular level in the graph.

For commit, this isn't a big deal: we can disable bitmaps in the backing
file while we commit and then re-enable it on completion. We usually
have a separate bitmap enabled on the root node that is recording writes
during this time, and can be moved later.

For streaming, this is more challenging: new writes will dirty the
bitmap, but so will writes from the stream job itself.

Semantically, we want to ignore writes from the stream while recording
them from the guest -- differentiating based on source.


No, based on source is actually not what you want. What you really want
is that BDRV_REQ_WRITE_UNCHANGED doesn't mark any blocks dirty.



This is why I sent the mail, I figured you'd know the better incision
point, and I was right!


We discussed this specific case of streaming at FOSDEM (with Paolo and
probably Nir). Paolo was even convinced that unchanged writes already
behave like this, but we agreed that dirtying blocks for them would be a
bug. After checking that the code is indeed buggy, I was planning to
send a patch, but never got around to actually do that. Sorry about
that.



Glad to hear it has been given consideration, though!


Bitmaps aren't really geared to do that right now. With the changes to
Bdrv Roles that Max was engineering, do you think it's possible to add
some kind of write source discrimination to bitmaps, or is that too messy?


I don't think it would work because copy-on-read requests come from the
same parent node as writes (no matter whether the legacy code in
block/io.c or a copy-on-read filter node is used).



Oh, understood. Rule that approach out, then.


For both commit and stream, it might be nice to say: "This bitmap is
enabled, but ignores writes from [all? specific types? specific
instances?] jobs.


Commit is a bit trickier, because it's not WRITE_UNCHANGED. The result
is only unchanged for the top layer, but not for the backing file you're
committing to. Not sure if we can represent this condition somehow.



Nothing comes to mind apart from a semantic that applies to a graph
subsection instead of an individual node.

i.e. UNCHANGED as applied to [A --> B].

Not saying that's reasonable to develop... or necessarily even possible
to enforce, just nothing else comes to mind.


Or, I wonder if what we truly want is some kind of bitmap "forwarder"
object on block-backend objects that represent the semantic drive view,
and only writes through that *backend* get forwarded to the bitmaps
attached to whatever node the bitmap is actually associated with.

(That might wind up causing weird problems too, though... since those
objects are no longer intended to be user-addressable, managing that
configuration might get intensely strange.)


Hm... Drive-based does suggest that it's managed at the BlockBackend
level. So having a bitmap that isn't added as a dirty bitmap to the BDS,
but only to the BB does make sense to me. The BB would be addressed
with the qdev ID of the device, as usual (which underlines that it's
really per device).



That's the rough idea, though if it's needed or not is unclear. We might
be able to get by with node semantics if we jazz them up enough...?

Working around all the edge bases of a drive-semantic bitmap seem
difficult to reason about.

In general, it should likely be made persistent against the root-most
node to which writes are routed to a protocol node.
In the common case, that means the top-most qcow2 format node of a chain.

But, many of our job filters also route writes to format nodes too, so
which one is the "canonical" store? Is it even possible to define?

Hm.

I suppose we can always allow bitmaps being attached to the BB and
*disallow them* from being made persistent; but simply exist as an
in-memory tool to help ease the pain of managing data consistency during
critical sections.

We can offer merge-to, merge-from semantics for these in-memory-only
bitmaps. Maybe.

Still mulling it over.


We can just add "empty" filter node, and move bitmaps to it and than back
again. If I understand correctly, this gives same semantics like with
moving to BB, but without any additional BB logic. We just need "do-nothing"
filter driver for it.




The part that's unclear to me is how to make such bitmaps persistent.
You can change the root node of a BB and even remove the root node
completely (for removable devices; but even changing is technically
remove followed by insert), so you may need to move the bitmap around
between image files 

Re: block stream and bitmaps

2020-03-24 Thread John Snow



On 3/24/20 6:18 AM, Kevin Wolf wrote:
> Am 23.03.2020 um 19:06 hat John Snow geschrieben:
>> Hi Kevin,
>>
>> I'm hoping to get some preliminary ideas from you (capped at five
>> minutes' effort?) on this subject. My ideas here are a bit shaky, I only
>> have really rough notions here.
>>
>> We want to use bitmaps with 'drive' semantics; i.e. tracking changes to
>> the visible guest data. What we have are bitmaps with node semantics,
>> tracking changes to the data at a particular level in the graph.
>>
>> For commit, this isn't a big deal: we can disable bitmaps in the backing
>> file while we commit and then re-enable it on completion. We usually
>> have a separate bitmap enabled on the root node that is recording writes
>> during this time, and can be moved later.
>>
>> For streaming, this is more challenging: new writes will dirty the
>> bitmap, but so will writes from the stream job itself.
>>
>> Semantically, we want to ignore writes from the stream while recording
>> them from the guest -- differentiating based on source.
> 
> No, based on source is actually not what you want. What you really want
> is that BDRV_REQ_WRITE_UNCHANGED doesn't mark any blocks dirty.
> 

This is why I sent the mail, I figured you'd know the better incision
point, and I was right!

> We discussed this specific case of streaming at FOSDEM (with Paolo and
> probably Nir). Paolo was even convinced that unchanged writes already
> behave like this, but we agreed that dirtying blocks for them would be a
> bug. After checking that the code is indeed buggy, I was planning to
> send a patch, but never got around to actually do that. Sorry about
> that.
> 

Glad to hear it has been given consideration, though!

>> Bitmaps aren't really geared to do that right now. With the changes to
>> Bdrv Roles that Max was engineering, do you think it's possible to add
>> some kind of write source discrimination to bitmaps, or is that too messy?
> 
> I don't think it would work because copy-on-read requests come from the
> same parent node as writes (no matter whether the legacy code in
> block/io.c or a copy-on-read filter node is used).
> 

Oh, understood. Rule that approach out, then.

>> For both commit and stream, it might be nice to say: "This bitmap is
>> enabled, but ignores writes from [all? specific types? specific
>> instances?] jobs.
> 
> Commit is a bit trickier, because it's not WRITE_UNCHANGED. The result
> is only unchanged for the top layer, but not for the backing file you're
> committing to. Not sure if we can represent this condition somehow.
> 

Nothing comes to mind apart from a semantic that applies to a graph
subsection instead of an individual node.

i.e. UNCHANGED as applied to [A --> B].

Not saying that's reasonable to develop... or necessarily even possible
to enforce, just nothing else comes to mind.

>> Or, I wonder if what we truly want is some kind of bitmap "forwarder"
>> object on block-backend objects that represent the semantic drive view,
>> and only writes through that *backend* get forwarded to the bitmaps
>> attached to whatever node the bitmap is actually associated with.
>>
>> (That might wind up causing weird problems too, though... since those
>> objects are no longer intended to be user-addressable, managing that
>> configuration might get intensely strange.)
> 
> Hm... Drive-based does suggest that it's managed at the BlockBackend
> level. So having a bitmap that isn't added as a dirty bitmap to the BDS,
> but only to the BB does make sense to me. The BB would be addressed
> with the qdev ID of the device, as usual (which underlines that it's
> really per device).
> 

That's the rough idea, though if it's needed or not is unclear. We might
be able to get by with node semantics if we jazz them up enough...?

Working around all the edge bases of a drive-semantic bitmap seem
difficult to reason about.

In general, it should likely be made persistent against the root-most
node to which writes are routed to a protocol node.
In the common case, that means the top-most qcow2 format node of a chain.

But, many of our job filters also route writes to format nodes too, so
which one is the "canonical" store? Is it even possible to define?

Hm.

I suppose we can always allow bitmaps being attached to the BB and
*disallow them* from being made persistent; but simply exist as an
in-memory tool to help ease the pain of managing data consistency during
critical sections.

We can offer merge-to, merge-from semantics for these in-memory-only
bitmaps. Maybe.

Still mulling it over.

> The part that's unclear to me is how to make such bitmaps persistent.
> You can change the root node of a BB and even remove the root node
> completely (for removable devices; but even changing is technically
> remove followed by insert), so you may need to move the bitmap around
> between image files and at least for some time you might not have any
> place to store the bitmap.
> 

Yeah, we've had discussions about this in the past. 

Re: block stream and bitmaps

2020-03-24 Thread Kevin Wolf
Am 23.03.2020 um 19:06 hat John Snow geschrieben:
> Hi Kevin,
> 
> I'm hoping to get some preliminary ideas from you (capped at five
> minutes' effort?) on this subject. My ideas here are a bit shaky, I only
> have really rough notions here.
> 
> We want to use bitmaps with 'drive' semantics; i.e. tracking changes to
> the visible guest data. What we have are bitmaps with node semantics,
> tracking changes to the data at a particular level in the graph.
> 
> For commit, this isn't a big deal: we can disable bitmaps in the backing
> file while we commit and then re-enable it on completion. We usually
> have a separate bitmap enabled on the root node that is recording writes
> during this time, and can be moved later.
> 
> For streaming, this is more challenging: new writes will dirty the
> bitmap, but so will writes from the stream job itself.
> 
> Semantically, we want to ignore writes from the stream while recording
> them from the guest -- differentiating based on source.

No, based on source is actually not what you want. What you really want
is that BDRV_REQ_WRITE_UNCHANGED doesn't mark any blocks dirty.

We discussed this specific case of streaming at FOSDEM (with Paolo and
probably Nir). Paolo was even convinced that unchanged writes already
behave like this, but we agreed that dirtying blocks for them would be a
bug. After checking that the code is indeed buggy, I was planning to
send a patch, but never got around to actually do that. Sorry about
that.

> Bitmaps aren't really geared to do that right now. With the changes to
> Bdrv Roles that Max was engineering, do you think it's possible to add
> some kind of write source discrimination to bitmaps, or is that too messy?

I don't think it would work because copy-on-read requests come from the
same parent node as writes (no matter whether the legacy code in
block/io.c or a copy-on-read filter node is used).

> For both commit and stream, it might be nice to say: "This bitmap is
> enabled, but ignores writes from [all? specific types? specific
> instances?] jobs.

Commit is a bit trickier, because it's not WRITE_UNCHANGED. The result
is only unchanged for the top layer, but not for the backing file you're
committing to. Not sure if we can represent this condition somehow.

> Or, I wonder if what we truly want is some kind of bitmap "forwarder"
> object on block-backend objects that represent the semantic drive view,
> and only writes through that *backend* get forwarded to the bitmaps
> attached to whatever node the bitmap is actually associated with.
> 
> (That might wind up causing weird problems too, though... since those
> objects are no longer intended to be user-addressable, managing that
> configuration might get intensely strange.)

Hm... Drive-based does suggest that it's managed at the BlockBackend
level. So having a bitmap that isn't added as a dirty bitmap to the BDS,
but only to the BB does make sense to me. The BB would be addressed
with the qdev ID of the device, as usual (which underlines that it's
really per device).

The part that's unclear to me is how to make such bitmaps persistent.
You can change the root node of a BB and even remove the root node
completely (for removable devices; but even changing is technically
remove followed by insert), so you may need to move the bitmap around
between image files and at least for some time you might not have any
place to store the bitmap.

Or you say that you store it in one specific node, be it the root node
of the BB or not, and it will always stay there no matter how you change
the graph and whether the BB and that node are even in the same subtree.
That node would just get an additonal refcount, so you can't remove it
until the BB goes away.

Unless you already have a better plan (I hope you do, I didn't think
about it for more than a few minutes), maybe the latter would actually
be the most reasonable solution.

Kevin