Re: backup_calculate_cluster_size does not consider source

Max Reitz Wed, 06 Nov 2019 01:37:32 -0800

On 06.11.19 09:32, Stefan Hajnoczi wrote:
> On Tue, Nov 05, 2019 at 11:02:44AM +0100, Dietmar Maurer wrote:
>> Example: Backup from ceph disk (rbd_cache=false) to local disk:
>>
>> backup_calculate_cluster_size returns 64K (correct for my local .raw image)
>>
>> Then the backup job starts to read 64K blocks from ceph.
>>
>> But ceph always reads 4M block, so this is incredibly slow and produces
>> way too much network traffic.
>>
>> Why does backup_calculate_cluster_size does not consider the block size from
>> the source disk? 
>>
>> cluster_size = MAX(block_size_source, block_size_target)


So Ceph always transmits 4 MB over the network, no matter what is
actually needed?  That sounds, well, interesting.

backup_calculate_cluster_size() doesn’t consider the source size because
to my knowledge there is no other medium that behaves this way.  So I
suppose the assumption was always that the block size of the source
doesn’t matter, because a partial read is always possible (without
having to read everything).


What would make sense to me is to increase the buffer size in general.
I don’t think we need to copy clusters at a time, and
0e2402452f1f2042923a5 has indeed increased the copy size to 1 MB for
backup writes that are triggered by guest writes.  We haven’t yet
increased the copy size for background writes, though.  We can do that,
of course.  (And probably should.)

The thing is, it just seems unnecessary to me to take the source cluster
size into account in general.  It seems weird that a medium only allows
4 MB reads, because, well, guests aren’t going to take that into account.

Max

signature.asc
Description: OpenPGP digital signature

Re: backup_calculate_cluster_size does not consider source

Reply via email to