Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock)

2021-03-31 Thread Vladimir Sementsov-Ogievskiy

30.03.2021 19:39, Max Reitz wrote:

==

OK, I think now that you didn't mean qcow2_alloc_clusters(). So, we are saying about only 
functions returning an offset to cluster with "guest data", not to any kind of 
host cluster. Than what you propose looks like this to me:

  - take my v5
  - rename qcow2_inflight_writes_dec() to put_cluster_offset()
  - call qcow2_inflight_writes_inc() from the three functions you mention


Yes, I think so.  Or you take the CoRwLock in those three functions, depending 
on which implementation we want.


CoRwLock brings one inconvenient feature: to pass ownership on the reference to 
coroutine (for example to task coroutine in qcow2_co_pwritev) we have to unlock 
the lock in original coroutine and lock in another, as CoRwLock doesn't support 
transferring to other coroutine. So we'll do put_cluster_offset() in 
qcow2_co_pwritev() and do get_cluster_offset() in qcow2_co_pwritev_task(). (and 
we must be sure that the second lock taken before releasing the first one). So 
in this case it probably better to keep direct CoRwLock interface, not 
shadowing it by get/put.

--
Best regards,
Vladimir



Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock)

2021-03-30 Thread Vladimir Sementsov-Ogievskiy

30.03.2021 19:39, Max Reitz wrote:

On 30.03.21 15:25, Vladimir Sementsov-Ogievskiy wrote:

30.03.2021 15:51, Max Reitz wrote:

On 30.03.21 12:51, Vladimir Sementsov-Ogievskiy wrote:

30.03.2021 12:49, Max Reitz wrote:

On 25.03.21 20:12, Vladimir Sementsov-Ogievskiy wrote:

ping. Do we want it for 6.0?


I’d rather wait.  I think the conclusion was that guests shouldn’t hit this 
because they serialize discards?


I think, that we never had bugs, so we of course can wait.



There’s also something Kevin wrote on IRC a couple of weeks ago, for which I 
had hoped he’d sent an email but I don’t think he did, so I’ll try to remember 
and paraphrase as well as I can...

He basically asked whether it wouldn’t be conceptually simpler to take a 
reference to some cluster in get_cluster_offset() and later release it with a 
to-be-added put_cluster_offset().

He also noted that reading is problematic, too, because if you read a discarded 
and reused cluster, this might result in an information leak (some guest 
application might be able to read data it isn’t allowed to read); that’s why 
making get_cluster_offset() the point of locking clusters against discarding 
would be better.


Yes, I thought about read too, (RFCed in cover letter of [PATCH v5 0/6] qcow2: 
fix parallel rewrite and discard (lockless))



This would probably work with both of your solutions.  For the in-memory 
solutions, you’d take a refcount to an actual cluster; in the CoRwLock 
solution, you’d take that lock.

What do you think?



Hmm. What do you mean? Just rename my qcow2_inflight_writes_inc() and 
qcow2_inflight_writes_dec() to get_cluster_offset()/put_cluster_offset(), to 
make it more native to use for read operations as well?


Hm.  Our discussion wasn’t so detailed.

I interpreted it to mean all qcow2 functions that find an offset to a qcow2 
cluster, namely qcow2_get_host_offset(), qcow2_alloc_host_offset(), and 
qcow2_alloc_compressed_cluster_offset().


What about qcow2_alloc_clusters() ?


Seems like all callers for that but do_alloc_cluster_offset() call it to 
allocate metadata clusters, which cannot be discarded from the guest.

Is it really possible that some metadata cluster is used while qcow2 discards 
it internally at the same time, or isn’t all of this only a problem for data 
clusters?


It's a good question which I think doesn't have fast answer.

I think:

1. On normal code paths we usually handle qcow2 metadata always under s->lock 
mutex. So, we can't decrease refcount to zero in parallel, as update_refcount() 
should be called under s->lock too.

2. [1] is violated at least by code paths that are not in coroutine (for example 
qcow2-snapshots). But seems that on these paths we are guarded by other things 
(like drained section).. For sure we should move the whole qcow2 driver to 
coroutine and do every metadata update under s->lock.

3. Does [1] violated on some coroutine paths I don't know. But I hope that it 
doesn't, otherwise we should have bugs..

In future we'll probably want to work with metadata without s->lock, at least for 
IO. Then we'll need same mechanisms like we are now inventing for data writes.. But 
I'm not sure that for metadata it will be so simple (as for now mutex prevents not 
only parallel write & discard of metadata, but also parallel updates. (And for 
parallel guest writes to same cluster we don't care, it's a problem of the guest)




When those functions return an offset (in)to some cluster, that cluster (or the 
image as a whole) should be locked against discards. Every offset received this 
way would require an accompanying qcow2_put_host_offset().


Or to update any kind of "getting cluster offset" in the whole qcow2 driver to take a 
kind of "dynamic reference count" by get_cluster_offset() and then call corresponding 
put() somewhere? In this case I'm afraid it's a lot more work..


Hm, really?  I would have assumed we need to do some locking in all functions 
that get a cluster offset this way, so it should be less work to take the lock 
in the functions they invoke to get the offset.


It would be also the problem that a lot of paths in qcow2 are not in coroutine and 
don't even take s->lock when they actually should.


I’m not sure what you mean here, because all functions that invoke any of the 
three functions I listed above are coroutine_fns (or, well, I didn’t look it 
up, but they all have *_co_* in their name).


qcow2_alloc_clusters() has a lot more callers..


Let’s hope we don’t need to worry about it then. O:)


This will also mean that we do same job as normal qcow2 refcounts already do: no sense in 
keeping additional "dynamic refcount" for L2 table cluster while reading it, as 
we already have non-zero qcow2 normal refcount for it..


I’m afraid I don’t understand how normal refcounts relate to this. For example, 
qcow2_get_host_offset() doesn’t touch refcounts at all.



I mean the following: remember our discussion about what is free-cluster. If we add 

Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock)

2021-03-30 Thread Max Reitz

On 30.03.21 15:25, Vladimir Sementsov-Ogievskiy wrote:

30.03.2021 15:51, Max Reitz wrote:

On 30.03.21 12:51, Vladimir Sementsov-Ogievskiy wrote:

30.03.2021 12:49, Max Reitz wrote:

On 25.03.21 20:12, Vladimir Sementsov-Ogievskiy wrote:

ping. Do we want it for 6.0?


I’d rather wait.  I think the conclusion was that guests shouldn’t 
hit this because they serialize discards?


I think, that we never had bugs, so we of course can wait.



There’s also something Kevin wrote on IRC a couple of weeks ago, for 
which I had hoped he’d sent an email but I don’t think he did, so 
I’ll try to remember and paraphrase as well as I can...


He basically asked whether it wouldn’t be conceptually simpler to 
take a reference to some cluster in get_cluster_offset() and later 
release it with a to-be-added put_cluster_offset().


He also noted that reading is problematic, too, because if you read 
a discarded and reused cluster, this might result in an information 
leak (some guest application might be able to read data it isn’t 
allowed to read); that’s why making get_cluster_offset() the point 
of locking clusters against discarding would be better.


Yes, I thought about read too, (RFCed in cover letter of [PATCH v5 
0/6] qcow2: fix parallel rewrite and discard (lockless))




This would probably work with both of your solutions.  For the 
in-memory solutions, you’d take a refcount to an actual cluster; in 
the CoRwLock solution, you’d take that lock.


What do you think?



Hmm. What do you mean? Just rename my qcow2_inflight_writes_inc() and 
qcow2_inflight_writes_dec() to 
get_cluster_offset()/put_cluster_offset(), to make it more native to 
use for read operations as well?


Hm.  Our discussion wasn’t so detailed.

I interpreted it to mean all qcow2 functions that find an offset to a 
qcow2 cluster, namely qcow2_get_host_offset(), 
qcow2_alloc_host_offset(), and qcow2_alloc_compressed_cluster_offset().


What about qcow2_alloc_clusters() ?


Seems like all callers for that but do_alloc_cluster_offset() call it to 
allocate metadata clusters, which cannot be discarded from the guest.


Is it really possible that some metadata cluster is used while qcow2 
discards it internally at the same time, or isn’t all of this only a 
problem for data clusters?


When those functions return an offset (in)to some cluster, that 
cluster (or the image as a whole) should be locked against discards.  
Every offset received this way would require an accompanying 
qcow2_put_host_offset().


Or to update any kind of "getting cluster offset" in the whole qcow2 
driver to take a kind of "dynamic reference count" by 
get_cluster_offset() and then call corresponding put() somewhere? In 
this case I'm afraid it's a lot more work..


Hm, really?  I would have assumed we need to do some locking in all 
functions that get a cluster offset this way, so it should be less 
work to take the lock in the functions they invoke to get the offset.


It would be also the problem that a lot of paths in qcow2 are not in 
coroutine and don't even take s->lock when they actually should.


I’m not sure what you mean here, because all functions that invoke any 
of the three functions I listed above are coroutine_fns (or, well, I 
didn’t look it up, but they all have *_co_* in their name).


qcow2_alloc_clusters() has a lot more callers..


Let’s hope we don’t need to worry about it then. O:)

This will also mean that we do same job as normal qcow2 refcounts 
already do: no sense in keeping additional "dynamic refcount" for L2 
table cluster while reading it, as we already have non-zero qcow2 
normal refcount for it..


I’m afraid I don’t understand how normal refcounts relate to this.  
For example, qcow2_get_host_offset() doesn’t touch refcounts at all.




I mean the following: remember our discussion about what is 
free-cluster. If we add "dynamic-refcount", or "infligth-write-counter" 
thing only to count inflight data-writing (or, as discussed, we should 
count data-reads as well) operations, than "full reference count" of the 
cluster is inflight-write-count + qcow2-metadata-refcount.


But if we add a kind of "dynamic refcount" for any use of host cluster, 
for example reading of L2 table, than we duplicate the reference in 
qcow2-metadata to this L2 table (represented as refcount) by our 
"dynamic refcount", and we don't have a concept of "full reference 
count" as the sum above.. We still should treat a cluster as free when 
both "dynamic refcount" and qcow2-metadata-refcount are zero, but their 
sum doesn't have a good sense. Not a problem maybe.. But looks like a 
complication with no benefit.


You’re right, but I don’t think that’s a real problem.  Perhaps the sum 
was even a false equivalency.  There is a difference between the dynamic 
refcount and the on-disk refcount: We must wait with discarding until 
the the dynamic refcount is 0, and discarding will then drop the on-disk 
refcount to 0, too.  I think.  So I’m not sure whether the sum 

Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock)

2021-03-30 Thread Vladimir Sementsov-Ogievskiy

30.03.2021 15:51, Max Reitz wrote:

On 30.03.21 12:51, Vladimir Sementsov-Ogievskiy wrote:

30.03.2021 12:49, Max Reitz wrote:

On 25.03.21 20:12, Vladimir Sementsov-Ogievskiy wrote:

ping. Do we want it for 6.0?


I’d rather wait.  I think the conclusion was that guests shouldn’t hit this 
because they serialize discards?


I think, that we never had bugs, so we of course can wait.



There’s also something Kevin wrote on IRC a couple of weeks ago, for which I 
had hoped he’d sent an email but I don’t think he did, so I’ll try to remember 
and paraphrase as well as I can...

He basically asked whether it wouldn’t be conceptually simpler to take a 
reference to some cluster in get_cluster_offset() and later release it with a 
to-be-added put_cluster_offset().

He also noted that reading is problematic, too, because if you read a discarded 
and reused cluster, this might result in an information leak (some guest 
application might be able to read data it isn’t allowed to read); that’s why 
making get_cluster_offset() the point of locking clusters against discarding 
would be better.


Yes, I thought about read too, (RFCed in cover letter of [PATCH v5 0/6] qcow2: 
fix parallel rewrite and discard (lockless))



This would probably work with both of your solutions.  For the in-memory 
solutions, you’d take a refcount to an actual cluster; in the CoRwLock 
solution, you’d take that lock.

What do you think?



Hmm. What do you mean? Just rename my qcow2_inflight_writes_inc() and 
qcow2_inflight_writes_dec() to get_cluster_offset()/put_cluster_offset(), to 
make it more native to use for read operations as well?


Hm.  Our discussion wasn’t so detailed.

I interpreted it to mean all qcow2 functions that find an offset to a qcow2 
cluster, namely qcow2_get_host_offset(), qcow2_alloc_host_offset(), and 
qcow2_alloc_compressed_cluster_offset().


What about qcow2_alloc_clusters() ?



When those functions return an offset (in)to some cluster, that cluster (or the 
image as a whole) should be locked against discards.  Every offset received 
this way would require an accompanying qcow2_put_host_offset().


Or to update any kind of "getting cluster offset" in the whole qcow2 driver to take a 
kind of "dynamic reference count" by get_cluster_offset() and then call corresponding 
put() somewhere? In this case I'm afraid it's a lot more work..


Hm, really?  I would have assumed we need to do some locking in all functions 
that get a cluster offset this way, so it should be less work to take the lock 
in the functions they invoke to get the offset.


It would be also the problem that a lot of paths in qcow2 are not in coroutine and 
don't even take s->lock when they actually should.


I’m not sure what you mean here, because all functions that invoke any of the 
three functions I listed above are coroutine_fns (or, well, I didn’t look it 
up, but they all have *_co_* in their name).


qcow2_alloc_clusters() has a lot more callers..




This will also mean that we do same job as normal qcow2 refcounts already do: no sense in 
keeping additional "dynamic refcount" for L2 table cluster while reading it, as 
we already have non-zero qcow2 normal refcount for it..


I’m afraid I don’t understand how normal refcounts relate to this.  For 
example, qcow2_get_host_offset() doesn’t touch refcounts at all.



I mean the following: remember our discussion about what is free-cluster. If we add 
"dynamic-refcount", or "infligth-write-counter" thing only to count inflight data-writing 
(or, as discussed, we should count data-reads as well) operations, than "full reference count" of 
the cluster is inflight-write-count + qcow2-metadata-refcount.

But if we add a kind of "dynamic refcount" for any use of host cluster, for example reading of L2 table, than 
we duplicate the reference in qcow2-metadata to this L2 table (represented as refcount) by our "dynamic 
refcount", and we don't have a concept of "full reference count" as the sum above.. We still should 
treat a cluster as free when both "dynamic refcount" and qcow2-metadata-refcount are zero, but their sum 
doesn't have a good sense. Not a problem maybe.. But looks like a complication with no benefit.


==

OK, I think now that you didn't mean qcow2_alloc_clusters(). So, we are saying about only 
functions returning an offset to cluster with "guest data", not to any kind of 
host cluster. Than what you propose looks like this to me:

 - take my v5
 - rename qcow2_inflight_writes_dec() to put_cluster_offset()
 - call qcow2_inflight_writes_inc() from the three functions you mention

That make sense to me. Still, put_cluster_offset() name doesn't make it obvious that it's 
only for clusters with "guest data", and we shouldn't call it when work with 
metadata clusters.

--
Best regards,
Vladimir



Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock)

2021-03-30 Thread Max Reitz

On 30.03.21 12:51, Vladimir Sementsov-Ogievskiy wrote:

30.03.2021 12:49, Max Reitz wrote:

On 25.03.21 20:12, Vladimir Sementsov-Ogievskiy wrote:

ping. Do we want it for 6.0?


I’d rather wait.  I think the conclusion was that guests shouldn’t hit 
this because they serialize discards?


I think, that we never had bugs, so we of course can wait.



There’s also something Kevin wrote on IRC a couple of weeks ago, for 
which I had hoped he’d sent an email but I don’t think he did, so I’ll 
try to remember and paraphrase as well as I can...


He basically asked whether it wouldn’t be conceptually simpler to take 
a reference to some cluster in get_cluster_offset() and later release 
it with a to-be-added put_cluster_offset().


He also noted that reading is problematic, too, because if you read a 
discarded and reused cluster, this might result in an information leak 
(some guest application might be able to read data it isn’t allowed to 
read); that’s why making get_cluster_offset() the point of locking 
clusters against discarding would be better.


Yes, I thought about read too, (RFCed in cover letter of [PATCH v5 0/6] 
qcow2: fix parallel rewrite and discard (lockless))




This would probably work with both of your solutions.  For the 
in-memory solutions, you’d take a refcount to an actual cluster; in 
the CoRwLock solution, you’d take that lock.


What do you think?



Hmm. What do you mean? Just rename my qcow2_inflight_writes_inc() and 
qcow2_inflight_writes_dec() to 
get_cluster_offset()/put_cluster_offset(), to make it more native to use 
for read operations as well?


Hm.  Our discussion wasn’t so detailed.

I interpreted it to mean all qcow2 functions that find an offset to a 
qcow2 cluster, namely qcow2_get_host_offset(), 
qcow2_alloc_host_offset(), and qcow2_alloc_compressed_cluster_offset().


When those functions return an offset (in)to some cluster, that cluster 
(or the image as a whole) should be locked against discards.  Every 
offset received this way would require an accompanying 
qcow2_put_host_offset().


Or to update any kind of "getting cluster offset" in the whole qcow2 
driver to take a kind of "dynamic reference count" by 
get_cluster_offset() and then call corresponding put() somewhere? In 
this case I'm afraid it's a lot more work..


Hm, really?  I would have assumed we need to do some locking in all 
functions that get a cluster offset this way, so it should be less work 
to take the lock in the functions they invoke to get the offset.


It would be also the problem 
that a lot of paths in qcow2 are not in coroutine and don't even take 
s->lock when they actually should.


I’m not sure what you mean here, because all functions that invoke any 
of the three functions I listed above are coroutine_fns (or, well, I 
didn’t look it up, but they all have *_co_* in their name).


This will also mean that we do same 
job as normal qcow2 refcounts already do: no sense in keeping additional 
"dynamic refcount" for L2 table cluster while reading it, as we already 
have non-zero qcow2 normal refcount for it..


I’m afraid I don’t understand how normal refcounts relate to this.  For 
example, qcow2_get_host_offset() doesn’t touch refcounts at all.


Max




Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock)

2021-03-30 Thread Vladimir Sementsov-Ogievskiy

30.03.2021 12:49, Max Reitz wrote:

On 25.03.21 20:12, Vladimir Sementsov-Ogievskiy wrote:

ping. Do we want it for 6.0?


I’d rather wait.  I think the conclusion was that guests shouldn’t hit this 
because they serialize discards?


I think, that we never had bugs, so we of course can wait.



There’s also something Kevin wrote on IRC a couple of weeks ago, for which I 
had hoped he’d sent an email but I don’t think he did, so I’ll try to remember 
and paraphrase as well as I can...

He basically asked whether it wouldn’t be conceptually simpler to take a 
reference to some cluster in get_cluster_offset() and later release it with a 
to-be-added put_cluster_offset().

He also noted that reading is problematic, too, because if you read a discarded 
and reused cluster, this might result in an information leak (some guest 
application might be able to read data it isn’t allowed to read); that’s why 
making get_cluster_offset() the point of locking clusters against discarding 
would be better.


Yes, I thought about read too, (RFCed in cover letter of [PATCH v5 0/6] qcow2: 
fix parallel rewrite and discard (lockless))



This would probably work with both of your solutions.  For the in-memory 
solutions, you’d take a refcount to an actual cluster; in the CoRwLock 
solution, you’d take that lock.

What do you think?



Hmm. What do you mean? Just rename my qcow2_inflight_writes_inc() and 
qcow2_inflight_writes_dec() to get_cluster_offset()/put_cluster_offset(), to 
make it more native to use for read operations as well?

Or to update any kind of "getting cluster offset" in the whole qcow2 driver to take a kind of 
"dynamic reference count" by get_cluster_offset() and then call corresponding put() somewhere? In this 
case I'm afraid it's a lot more work.. It would be also the problem that a lot of paths in qcow2 are not in 
coroutine and don't even take s->lock when they actually should. This will also mean that we do same job as 
normal qcow2 refcounts already do: no sense in keeping additional "dynamic refcount" for L2 table 
cluster while reading it, as we already have non-zero qcow2 normal refcount for it..


--
Best regards,
Vladimir



Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock)

2021-03-30 Thread Max Reitz

On 25.03.21 20:12, Vladimir Sementsov-Ogievskiy wrote:

ping. Do we want it for 6.0?


I’d rather wait.  I think the conclusion was that guests shouldn’t hit 
this because they serialize discards?


There’s also something Kevin wrote on IRC a couple of weeks ago, for 
which I had hoped he’d sent an email but I don’t think he did, so I’ll 
try to remember and paraphrase as well as I can...


He basically asked whether it wouldn’t be conceptually simpler to take a 
reference to some cluster in get_cluster_offset() and later release it 
with a to-be-added put_cluster_offset().


He also noted that reading is problematic, too, because if you read a 
discarded and reused cluster, this might result in an information leak 
(some guest application might be able to read data it isn’t allowed to 
read); that’s why making get_cluster_offset() the point of locking 
clusters against discarding would be better.


This would probably work with both of your solutions.  For the in-memory 
solutions, you’d take a refcount to an actual cluster; in the CoRwLock 
solution, you’d take that lock.


What do you think?

Max




Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock)

2021-03-25 Thread Vladimir Sementsov-Ogievskiy

ping. Do we want it for 6.0?

19.03.2021 13:08, Vladimir Sementsov-Ogievskiy wrote:

Look at 03 for the problem and fix. 01 is preparation and 02 is the
test.

Actually previous version of this thing is
[PATCH v2(RFC) 0/3] qcow2: fix parallel rewrite and discard

Still
[PATCH v3 0/6] qcow2: compressed write cache
includes another fix (more complicated) for the bug, so this is called
v4.

So, what's new:

It's still a CoRwlock based solution as suggested by Kevin.

Now I think that "writer" of the lock should be code in
update_refcount() which wants to set refcount to zero. If we consider
only guest discard request as "writer" we may miss other sources of
discarding host clusters (like rewriting compressed cluster to normal,
maybe some snapshot operations, who knows what's more).

And this means that we want to take rw-lock under qcow2 s->lock. And
this brings ordering restriction for the two locks: if we want both
locks taken, we should always take s->lock first, and never take s->lock
when rw-lock is already taken (otherwise we get classic deadlock).

This leads us to taking rd-lock for in-flight writes under s->lock in
same critical section where cluster is allocated (or just got from
metadata) and releasing after data writing completion.

This in turn leads to a bit tricky logic around transferring rd-lock to
task coroutine on normal write path (see 03).. But this is still simpler
than inflight-write-counters solution in v3..

Vladimir Sementsov-Ogievskiy (3):
   qemu-io: add aio_discard
   iotests: add qcow2-discard-during-rewrite
   block/qcow2: introduce discard_rw_lock: fix discarding host clusters

  block/qcow2.h |  20 +++
  block/qcow2-refcount.c|  22 
  block/qcow2.c |  73 +--
  qemu-io-cmds.c| 117 ++
  .../tests/qcow2-discard-during-rewrite|  99 +++
  .../tests/qcow2-discard-during-rewrite.out|  17 +++
  6 files changed, 341 insertions(+), 7 deletions(-)
  create mode 100755 tests/qemu-iotests/tests/qcow2-discard-during-rewrite
  create mode 100644 tests/qemu-iotests/tests/qcow2-discard-during-rewrite.out




--
Best regards,
Vladimir