Blockdev community,

Our group would like to write block device backups directly to an object 
store, using an interface such as s3fs or rclone-mount. We've run into 
problems with both interfaces, and in both cases the problems revolve 
around fdatasync system calls. With s3fs, fdatasync calls are painfully 
slow. With rclone-mount, the calls are very fast but don't do anything.

Syncing files to an object store is inherently problematic, as a proper 
sync requires finalizing the object that holds the file. After 
finalization, additional writes to the file require a new object to be 
created and the old object to be copied and destroyed. This process 
results in an N-squared performance problem for files that are synced 
periodically as they are written, as is the case for qemu backups.

Empirically, s3fs implements fdatasync, and hence backups written to s3fs 
take an untenably long time. I can provide data and straces, if needed.

Backups written to rclone-mount are much faster, but there are obvious 
semantic problems. The backup job completes successfully before the file 
is actually stable in the object store. And in fact, a lot of the work of 
finalizing the file occurs during the "close" system call that is invoked 
as part of the qmp_blockdev_del operation.The syscall causes that 
operation to take so long that other commands time out waiting to "acquire 
state change lock (held by monitor qemuProcessEventHandler)".

My questions for the group are: Has anyone else tried writing backups to 
file systems that don't have good support for fdatasync, and do you have 
any advice other than "Don't do that." ?

Thanks for any help.

- Bryan Rosenburg, IBM

Reply via email to