Hi Bryan, first of all, for your next question, please don't reply to a message in an unrelated thread, but start a new email. This will give you a lot more visibility because people generally use a threaded email view and will decide whether to read an email or not depending on whether the topic of that thread is interesting to them.
Am 27.04.2020 um 21:49 hat Bryan S Rosenburg geschrieben: > Blockdev community, > > Our group would like to write block device backups directly to an object > store, using an interface such as s3fs or rclone-mount. We've run into > problems with both interfaces, and in both cases the problems revolve > around fdatasync system calls. With s3fs, fdatasync calls are painfully > slow. With rclone-mount, the calls are very fast but don't do anything. > > Syncing files to an object store is inherently problematic, as a proper > sync requires finalizing the object that holds the file. After > finalization, additional writes to the file require a new object to be > created and the old object to be copied and destroyed. This process > results in an N-squared performance problem for files that are synced > periodically as they are written, as is the case for qemu backups. > > Empirically, s3fs implements fdatasync, and hence backups written to s3fs > take an untenably long time. I can provide data and straces, if needed. > > Backups written to rclone-mount are much faster, but there are obvious > semantic problems. The backup job completes successfully before the file > is actually stable in the object store. And in fact, a lot of the work of > finalizing the file occurs during the "close" system call that is invoked > as part of the qmp_blockdev_del operation.The syscall causes that > operation to take so long that other commands time out waiting to "acquire > state change lock (held by monitor qemuProcessEventHandler)". > > My questions for the group are: Has anyone else tried writing backups to > file systems that don't have good support for fdatasync, and do you have > any advice other than "Don't do that." ? I think "don't do that" is a good answer actually. You may want to put an NBD indirection between QEMU and your object store, so that the close() syscall will just block a qemu-nbd process that has already closed its connection to QEMU instead of blocking all of QEMU. It is possible to disable fdatasync() by specifying cache=unsafe for the block device, so you could avoid the penalty of repeated syncs on s3fs. Of course, if s3fs requires an fsync before data is actually stable, in this case you couldn't consider your backup completed when the backup block job finishes successfully, but you would have to issue an fsync manually and wait for its result before you can consider the backup successful. Kevin
