Re: [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
On Fri, 04/06 13:41, Paolo Bonzini wrote: > On 05/04/2018 14:55, Stefan Hajnoczi wrote: > > bdrv_copy_file_range() will invoke bdrv_co_copy_file_range_src() on > > src[qcow2]. The qcow2 block driver will invoke > > bdrv_co_copy_file_range_src() on src[file]. The file-posix driver will > > invoke bdrv_co_copy_file_range_dst() on dst[raw]. The raw driver will > > invoke bdrv_co_copy_file_range_dst() on dst[file], which sees that > > src_bds (src[file]) is also file-posix and then goes ahead with > > copy_file_range(2). > > > > In the case where src[qcow2] is on file-posix but dst[raw] is on iSCSI, > > the iSCSI .bdrv_co_copy_file_range_dst() call fails with -ENOTSUP and > > the block layer can fall back to a traditional copy operation. > > > > With this approach src[qcow2] could take a lock or keep track of a > > serializing request struct so that other requests cannot interfere with > > the operation, and it's done in a natural way since we remain in the > > qcow2 function until the entire operation completes. There's no need > > for bookkeeping structs or callbacks. > > Could there be AB-BA deadlock if the guest attempts a concurrent copy > from A to B and from B to A? I don't think bs_src need to hold its locks when calling into bs_dst for mapping write ranges. So it should be safe. Fam
Re: [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
On 05/04/2018 14:55, Stefan Hajnoczi wrote: > bdrv_copy_file_range() will invoke bdrv_co_copy_file_range_src() on > src[qcow2]. The qcow2 block driver will invoke > bdrv_co_copy_file_range_src() on src[file]. The file-posix driver will > invoke bdrv_co_copy_file_range_dst() on dst[raw]. The raw driver will > invoke bdrv_co_copy_file_range_dst() on dst[file], which sees that > src_bds (src[file]) is also file-posix and then goes ahead with > copy_file_range(2). > > In the case where src[qcow2] is on file-posix but dst[raw] is on iSCSI, > the iSCSI .bdrv_co_copy_file_range_dst() call fails with -ENOTSUP and > the block layer can fall back to a traditional copy operation. > > With this approach src[qcow2] could take a lock or keep track of a > serializing request struct so that other requests cannot interfere with > the operation, and it's done in a natural way since we remain in the > qcow2 function until the entire operation completes. There's no need > for bookkeeping structs or callbacks. Could there be AB-BA deadlock if the guest attempts a concurrent copy from A to B and from B to A? Paolo signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
On Wed, Apr 04, 2018 at 09:49:23PM +0800, Fam Zheng wrote: > On Wed, 04/04 14:23, Stefan Hajnoczi wrote: > > On Thu, Mar 29, 2018 at 07:09:06PM +0800, Fam Zheng wrote: > > > [Posting a preview RFC for the general idea discussion and internal API > > > review. > > > Libiscsi support is being worked on in the meantime.] > > > > > > This series introduces block layer API for copy offloading and makes use > > > of it > > > in qemu-img convert. > > > > > > For now we implemented the operation in local file protocol with > > > copy_file_range(2). Besides that it's possible to add similar to iscsi, > > > nfs > > > and potentially more. > > > > > > As far as its usage goes, in addition to qemu-img convert, we can emulate > > > offloading in scsi-disk (EXTENDED COPY), and do similar to drive-mirror. > > > > > > The new bdrv_co_map_range can also be an alternative way to implement > > > format > > > drivers in the future, once we make block/io.c use it in preadv/pwritev > > > paths. > > > > I posted concerns about the bdrv_co_map_range() interface. It would be > > safer to only have a copy_range() interface without exposing how data is > > mapped outside the driver where race conditions can occur and the format > > driver no longer has full control over file layout. > > It's a good point, but I couldn't think of a way to implement copy_range > between > two format drivers: both of them need to recurse down to their bs->file and > what > we eventually want is a copy_file_range() on two fds (or an iscsi equivalent): > > src[qcow2] -> dst[raw] > | | > v v > src[file]-> dst[file] > | | > v v >fd1 -> fd2 > copy_file_range > > Maybe we should add BlockDriver.bdrv_co_map_range_{prepare,commit,abort} and > call them from bdrv_co_copy_range(). This way the code path works pretty much > the same way to .bdrv_co_preadv/pwritev. Regarding the recursion, there are two phases: 1. Finding the leaf BDS of the source (the left side of your diagram) 2. finding the leaf BDS of the destination (the right side) Here is one way to represent this in BlockDriver: /* Map [offset, offset + nbytes) range onto a child of src_bs and * invoke bdrv_co_copy_file_range_src(child, ...), or invoke * bdrv_co_copy_file_range_dst() if the leaf child has been found. */ .bdrv_co_copy_file_range_src(src_bs, dst_bs, offset, nbytes) /* Map [offset, offset + nbytes) range onto a child of dst_bs and * invoke bdrv_co_copy_file_range_src(src_bs, child, ...), or perform * the copy operation if the leaf child has been found. Return * -ENOTSUP if src_bs and the leaf child have different BlockDrivers. */ .bdrv_co_copy_file_range_dst(src_bs, dst_bs, offset, nbytes) bdrv_copy_file_range() will invoke bdrv_co_copy_file_range_src() on src[qcow2]. The qcow2 block driver will invoke bdrv_co_copy_file_range_src() on src[file]. The file-posix driver will invoke bdrv_co_copy_file_range_dst() on dst[raw]. The raw driver will invoke bdrv_co_copy_file_range_dst() on dst[file], which sees that src_bds (src[file]) is also file-posix and then goes ahead with copy_file_range(2). In the case where src[qcow2] is on file-posix but dst[raw] is on iSCSI, the iSCSI .bdrv_co_copy_file_range_dst() call fails with -ENOTSUP and the block layer can fall back to a traditional copy operation. With this approach src[qcow2] could take a lock or keep track of a serializing request struct so that other requests cannot interfere with the operation, and it's done in a natural way since we remain in the qcow2 function until the entire operation completes. There's no need for bookkeeping structs or callbacks. Stefan signature.asc Description: PGP signature
Re: [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
On 04/04/2018 15:49, Fam Zheng wrote: >> I posted concerns about the bdrv_co_map_range() interface. It would be >> safer to only have a copy_range() interface without exposing how data is >> mapped outside the driver where race conditions can occur and the format >> driver no longer has full control over file layout. > It's a good point, but I couldn't think of a way to implement copy_range > between > two format drivers: both of them need to recurse down to their bs->file and > what > we eventually want is a copy_file_range() on two fds (or an iscsi equivalent): > > src[qcow2] -> dst[raw] > | | > v v > src[file]-> dst[file] > | | > v v >fd1 -> fd2 > copy_file_range > > Maybe we should add BlockDriver.bdrv_co_map_range_{prepare,commit,abort} and > call them from bdrv_co_copy_range(). This way the code path works pretty much > the same way to .bdrv_co_preadv/pwritev. Or you could have bdrv_co_copy_range verify the mapping and return -EAGAIN if it is not valid anymore? (Then bdrv_co_map_range should fill in a BlockDriverMapping struct, or something like that). Paolo
Re: [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
On Wed, 04/04 14:23, Stefan Hajnoczi wrote: > On Thu, Mar 29, 2018 at 07:09:06PM +0800, Fam Zheng wrote: > > [Posting a preview RFC for the general idea discussion and internal API > > review. > > Libiscsi support is being worked on in the meantime.] > > > > This series introduces block layer API for copy offloading and makes use of > > it > > in qemu-img convert. > > > > For now we implemented the operation in local file protocol with > > copy_file_range(2). Besides that it's possible to add similar to iscsi, nfs > > and potentially more. > > > > As far as its usage goes, in addition to qemu-img convert, we can emulate > > offloading in scsi-disk (EXTENDED COPY), and do similar to drive-mirror. > > > > The new bdrv_co_map_range can also be an alternative way to implement format > > drivers in the future, once we make block/io.c use it in preadv/pwritev > > paths. > > I posted concerns about the bdrv_co_map_range() interface. It would be > safer to only have a copy_range() interface without exposing how data is > mapped outside the driver where race conditions can occur and the format > driver no longer has full control over file layout. It's a good point, but I couldn't think of a way to implement copy_range between two format drivers: both of them need to recurse down to their bs->file and what we eventually want is a copy_file_range() on two fds (or an iscsi equivalent): src[qcow2] -> dst[raw] | | v v src[file]-> dst[file] | | v v fd1 -> fd2 copy_file_range Maybe we should add BlockDriver.bdrv_co_map_range_{prepare,commit,abort} and call them from bdrv_co_copy_range(). This way the code path works pretty much the same way to .bdrv_co_preadv/pwritev. Fam
Re: [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
On Thu, Mar 29, 2018 at 07:09:06PM +0800, Fam Zheng wrote: > [Posting a preview RFC for the general idea discussion and internal API > review. > Libiscsi support is being worked on in the meantime.] > > This series introduces block layer API for copy offloading and makes use of it > in qemu-img convert. > > For now we implemented the operation in local file protocol with > copy_file_range(2). Besides that it's possible to add similar to iscsi, nfs > and potentially more. > > As far as its usage goes, in addition to qemu-img convert, we can emulate > offloading in scsi-disk (EXTENDED COPY), and do similar to drive-mirror. > > The new bdrv_co_map_range can also be an alternative way to implement format > drivers in the future, once we make block/io.c use it in preadv/pwritev paths. I posted concerns about the bdrv_co_map_range() interface. It would be safer to only have a copy_range() interface without exposing how data is mapped outside the driver where race conditions can occur and the format driver no longer has full control over file layout. Stefan signature.asc Description: PGP signature