unmap

Andrey Drobyshev Thu, 09 Nov 2023 08:48:05 -0800

On 11/3/23 17:51, Hanna Czenczek wrote:
> On 20.10.23 23:56, Andrey Drobyshev wrote:
>> Add _verify_du_delta() checker which is used to check that real disk
>> usage delta meets the expectations.  For now we use it for checking that
>> subcluster-based discard/unmap operations lead to actual disk usage
>> decrease (i.e. PUNCH_HOLE operation is performed).
> 
> I’m not too happy about checking the disk usage because that relies on
> the underlying filesystem actually accepting and executing the unmap. 
> Why is it not enough to check the L2 bitmap?
> 
> …Coming back later (I had to fix the missing `ret = ` I mentioned in
> patch 2, or this test would hang, so I couldn’t run it at first), I note
> that checking the disk usage in fact doesn’t work on tmpfs.  I usually
> run the iotests in tmpfs, so that’s not great.
>


My original idea was to make sure that the PUNCH_HOLE operation did
indeed take place, i.e. there was an actual discard.  For instance,
currently the discard operation initiated by qemu-io is called with the
QCOW2_DISCARD_REQUEST discard type, but if some other type is passed by
mistake, qcow2_queue_discard() won't be called, and though the
subclusters will be marked unallocated in L2 the data will still be
there.  Not quite what we expect from discard operation.

BTW checking the disk usage on tmpfs works on my machine:

> # cd /tmp; df -Th /tmp
> Filesystem     Type   Size  Used Avail Use% Mounted on
> tmpfs          tmpfs   32G  2.5M   32G   1% /tmp
> # BUILD=/root/src/qemu/master/build
> # $BUILD/qemu-img create -f qcow2 -o extended_l2=on img.qcow2 1M
> Formatting 'img.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=on 
> compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16
> # $BUILD/qemu-io -c 'write -q 0 128k' img.qcow2
> # du --block-size=1 img.qcow2
> 397312  img.qcow2
> # $BUILD/qemu-io -f qcow2 -c 'discard -q 0 8k' img.qcow2
> # du --block-size=1 img.qcow2
> 389120  img.qcow2
> # $BUILD/qemu-io -f qcow2 -c 'discard -q 8k 120k' img.qcow2
> # du --block-size=1 img.qcow2
> 266240  img.qcow2

I'm wondering what this might depend on and can't we overcome this?

>> Also add separate test case for discarding particular subcluster within
>> one cluster.
>>
>> Signed-off-by: Andrey Drobyshev <andrey.drobys...@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/271     | 25 ++++++++++++++++++++++++-
>>   tests/qemu-iotests/271.out |  2 ++
>>   2 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
>> index c7c2cadda0..5fcb209f5f 100755
>> --- a/tests/qemu-iotests/271
>> +++ b/tests/qemu-iotests/271
>> @@ -81,6 +81,15 @@ _verify_l2_bitmap()
>>       fi
>>   }
>>   +# Check disk usage delta after a discard/unmap operation
>> +# _verify_du_delta $before $after $expected_delta
>> +_verify_du_delta()
>> +{
>> +    if [ $(($1 - $2)) -ne $3 ]; then
>> +        printf "ERROR: unexpected delta: $1 - $2 = $(($1 - $2)) != $3\n"
>> +    fi
>> +}
>> +
>>   # This should be called as _run_test c=XXX sc=XXX off=XXX len=XXX
>> cmd=XXX
>>   # c:   cluster number (0 if unset)
>>   # sc:  subcluster number inside cluster @c (0 if unset)
>> @@ -198,9 +207,12 @@ for use_backing_file in yes no; do
>>       alloc="$(seq 0 31)"; zero=""
>>       _run_test sc=0 len=64k
>>   -    ### Zero and unmap half of cluster #0 (this won't unmap it)
>> +    ### Zero and unmap half of cluster #0 (this will unmap it)
> 
> I think “it” refers to the cluster, and it is not unmapped.  This test
> case does not use a discard, but write -z instead, so it worked before. 
> (The L2 bitmap shown in the output doesn’t change, so functionally, this
> patch series didn’t change this case.)
> 

>From the _run_test() implementation:

> # cmd: the command to pass to qemu-io, must be one of                         
>   
> #      write    -> write                                                      
>   
> #      zero     -> write -z                                                   
>   
> #      unmap    -> write -z -u           <-------------                       
>                 
> #      compress -> write -c                                                   
>   
> #      discard  -> discard                                                    
>   
> _run_test()

So it actually uses 'write -z -u', and we end up with an actual unmap.
I agree that the l2 bitmap doesn't change, that's why I specifically
added disk usage check to catch the changed functionality.

>>       alloc="$(seq 16 31)"; zero="$(seq 0 15)"
>> +    before=$(disk_usage "$TEST_IMG")
>>       _run_test sc=0 len=32k cmd=unmap
>> +    after=$(disk_usage "$TEST_IMG")
>> +    _verify_du_delta $before $after 32768
>>         ### Zero and unmap cluster #0
>>       alloc=""; zero="$(seq 0 31)"
> 
> For this following case shown truncated here, why don’t we try
> “_run_test sc=16 len=32k cmd=unmap” instead of “sc=0 len=64k”?  I.e.
> unmap only the second half, which, thanks to patch 3, should still unmap
> the whole cluster, because the first half is already unmapped.
> 

Agreed.  And the interesting part is that here we'd be calling 'write -u
-z', thus following the zero_l2_subclusters() ->
discard_l2_subclusters() -> discard_in_l2_slice() path...

>> @@ -447,7 +459,10 @@ for use_backing_file in yes no; do
>>         # Subcluster-aligned request from clusters #12 to #14
>>       alloc="$(seq 0 15)"; zero="$(seq 16 31)"
>> +    before=$(disk_usage "$TEST_IMG")
>>       _run_test c=12 sc=16 len=128k cmd=unmap
>> +    after=$(disk_usage "$TEST_IMG")
>> +    _verify_du_delta $before $after $((128 * 1024))
>>       alloc=""; zero="$(seq 0 31)"
>>       _verify_l2_bitmap 13
>>       alloc="$(seq 16 31)"; zero="$(seq 0 15)"
>> @@ -528,6 +543,14 @@ for use_backing_file in yes no; do
>>       else
>>           _make_test_img -o extended_l2=on 1M
>>       fi
>> +    # Write cluster #0 and discard its subclusters #0-#3
>> +    $QEMU_IO -c 'write -q 0 64k' "$TEST_IMG"
>> +    before=$(disk_usage "$TEST_IMG")
>> +    $QEMU_IO -c 'discard -q 0 8k' "$TEST_IMG"
>> +    after=$(disk_usage "$TEST_IMG")
>> +    _verify_du_delta $before $after 8192
>> +    alloc="$(seq 4 31)"; zero="$(seq 0 3)"
>> +    _verify_l2_bitmap 0
>>       # Write clusters #0-#2 and then discard them
>>       $QEMU_IO -c 'write -q 0 128k' "$TEST_IMG"
>>       $QEMU_IO -c 'discard -q 0 128k' "$TEST_IMG"
> 
> Similarly to above, I think it would be good if we combined this
> following case with the one you added, i.e. to write 128k from the
> beginning, drop the write here, and change the discard to be “discard -q
> 8k 120k”, i.e. skip the subclusters we have already discarded, to see
> that this is still combined to discard the whole first cluster.
> 

...and here we'd be calling discard directly, thus following the
discard_l2_subclusters() -> discard_in_l2_slice() path.

> ...Ah, see, and when I try this, the following assertion fails:
> 
> qemu-io: ../block/qcow2-cache.c:156: qcow2_cache_destroy: Assertion
> `c->entries[i].ref == 0' failed.
> ./common.rc: line 220: 128894 Aborted                 (core dumped) (
> VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec
> "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" )
> 

Yes, I should've added qcow2_cache_put() when going
discard_l2_subclusters() -> discard_in_l2_slice(), same as I did on the
zero_l2_subclusters() -> zero_in_l2_slice() path.  Thanks for catching.

> Looks like an L2 table is leaked somewhere.  That’s why SCRI should be a
> g_auto()-able type.
> 

In this case this indeed makes sense, since when we extend the operation
from the subclusters range to the entire cluster, SCRI is no longer
needed.  The only question with this approach is the
zero_l2_subclusters() -> discard_l2_subclusters() path.

> Hanna
> 
> [...]

Re: [PATCH 7/7] iotests/271: check disk usage on subcluster-based discard/unmap

Reply via email to