Re: [dm-devel] [PATCH v5 02/10] block: Add copy offload support infrastructure
On Wed, Dec 07, 2022 at 07:19:00PM +0800, Ming Lei wrote: > On Wed, Dec 07, 2022 at 11:24:00AM +0530, Nitesh Shetty wrote: > > On Tue, Nov 29, 2022 at 05:14:28PM +0530, Nitesh Shetty wrote: > > > On Thu, Nov 24, 2022 at 08:03:56AM +0800, Ming Lei wrote: > > > > On Wed, Nov 23, 2022 at 03:37:12PM +0530, Nitesh Shetty wrote: > > > > > On Wed, Nov 23, 2022 at 04:04:18PM +0800, Ming Lei wrote: > > > > > > On Wed, Nov 23, 2022 at 11:28:19AM +0530, Nitesh Shetty wrote: > > > > > > > Introduce blkdev_issue_copy which supports source and destination > > > > > > > bdevs, > > > > > > > and an array of (source, destination and copy length) tuples. > > > > > > > Introduce REQ_COPY copy offload operation flag. Create a > > > > > > > read-write > > > > > > > bio pair with a token as payload and submitted to the device in > > > > > > > order. > > > > > > > Read request populates token with source specific information > > > > > > > which > > > > > > > is then passed with write request. > > > > > > > This design is courtesy Mikulas Patocka's token based copy > > > > > > > > > > > > I thought this patchset is just for enabling copy command which is > > > > > > supported by hardware. But turns out it isn't, because > > > > > > blk_copy_offload() > > > > > > still submits read/write bios for doing the copy. > > > > > > > > > > > > I am just wondering why not let copy_file_range() cover this kind > > > > > > of copy, > > > > > > and the framework has been there. > > > > > > > > > > > > > > > > Main goal was to enable copy command, but community suggested to add > > > > > copy emulation as well. > > > > > > > > > > blk_copy_offload - actually issues copy command in driver layer. > > > > > The way read/write BIOs are percieved is different for copy offload. > > > > > In copy offload we check REQ_COPY flag in NVMe driver layer to issue > > > > > copy command. But we did missed it to add in other driver's, where > > > > > they > > > > > might be treated as normal READ/WRITE. > > > > > > > > > > blk_copy_emulate - is used if we fail or if device doesn't support > > > > > native > > > > > copy offload command. Here we do READ/WRITE. Using copy_file_range for > > > > > emulation might be possible, but we see 2 issues here. > > > > > 1. We explored possibility of pulling dm-kcopyd to block layer so > > > > > that we > > > > > can readily use it. But we found it had many dependecies from > > > > > dm-layer. > > > > > So later dropped that idea. > > > > > > > > Is it just because dm-kcopyd supports async copy? If yes, I believe we > > > > can reply on io_uring for implementing async copy_file_range, which will > > > > be generic interface for async copy, and could get better perf. > > > > > > > > > > It supports both sync and async. But used only inside dm-layer. > > > Async version of copy_file_range can help, using io-uring can be helpful > > > for user , but in-kernel users can't use uring. > > > > > > > > 2. copy_file_range, for block device atleast we saw few check's which > > > > > fail > > > > > it for raw block device. At this point I dont know much about the > > > > > history of > > > > > why such check is present. > > > > > > > > Got it, but IMO the check in generic_copy_file_checks() can be > > > > relaxed to cover blkdev cause splice does support blkdev. > > > > > > > > Then your bdev offload copy work can be simplified into: > > > > > > > > 1) implement .copy_file_range for def_blk_fops, suppose it is > > > > blkdev_copy_file_range() > > > > > > > > 2) inside blkdev_copy_file_range() > > > > > > > > - if the bdev supports offload copy, just submit one bio to the device, > > > > and this will be converted to one pt req to device > > > > > > > > - otherwise, fallback to generic_copy_file_range() > > > > > > > > > > > Actually we sent initial version with single bio, but later community > > suggested two bio's is must for offload, main reasoning being > > Is there any link which holds the discussion? > This[1] is the starting thread for LSF/MM which Chaitanya organized and it was a conference call. Also in 2022 LSM/MM there was a discussion on copy topic as well. One of the main conclusion was using 2 bio's as must have. Bart has summarized previous copy efforts[2] and Martin too [3], which might be of help in understanding why 2 bio's are must. > > dm-layer,Xcopy,copy across namespace compatibilty. > > But dm kcopy has supported bdev copy already, so once your patch is > ready, dm kcopy can just sends one bio with REQ_COPY if the device > supports offload command, otherwise the current dm kcopy code can work > as before. > > > > > > We will check the feasibilty and try to implement the scheme in next > > > versions. > > > It would be helpful, if someone in community know's why such checks were > > > present ? We see copy_file_range accepts only regular file. Was it > > > designed only for regular files or can we extend it to regular block > > > device. > > > > > > > As you suggested we
Re: [dm-devel] [PATCH v5 02/10] block: Add copy offload support infrastructure
On Tue, Nov 29, 2022 at 05:14:28PM +0530, Nitesh Shetty wrote: > On Thu, Nov 24, 2022 at 08:03:56AM +0800, Ming Lei wrote: > > On Wed, Nov 23, 2022 at 03:37:12PM +0530, Nitesh Shetty wrote: > > > On Wed, Nov 23, 2022 at 04:04:18PM +0800, Ming Lei wrote: > > > > On Wed, Nov 23, 2022 at 11:28:19AM +0530, Nitesh Shetty wrote: > > > > > Introduce blkdev_issue_copy which supports source and destination > > > > > bdevs, > > > > > and an array of (source, destination and copy length) tuples. > > > > > Introduce REQ_COPY copy offload operation flag. Create a read-write > > > > > bio pair with a token as payload and submitted to the device in order. > > > > > Read request populates token with source specific information which > > > > > is then passed with write request. > > > > > This design is courtesy Mikulas Patocka's token based copy > > > > > > > > I thought this patchset is just for enabling copy command which is > > > > supported by hardware. But turns out it isn't, because > > > > blk_copy_offload() > > > > still submits read/write bios for doing the copy. > > > > > > > > I am just wondering why not let copy_file_range() cover this kind of > > > > copy, > > > > and the framework has been there. > > > > > > > > > > Main goal was to enable copy command, but community suggested to add > > > copy emulation as well. > > > > > > blk_copy_offload - actually issues copy command in driver layer. > > > The way read/write BIOs are percieved is different for copy offload. > > > In copy offload we check REQ_COPY flag in NVMe driver layer to issue > > > copy command. But we did missed it to add in other driver's, where they > > > might be treated as normal READ/WRITE. > > > > > > blk_copy_emulate - is used if we fail or if device doesn't support native > > > copy offload command. Here we do READ/WRITE. Using copy_file_range for > > > emulation might be possible, but we see 2 issues here. > > > 1. We explored possibility of pulling dm-kcopyd to block layer so that we > > > can readily use it. But we found it had many dependecies from dm-layer. > > > So later dropped that idea. > > > > Is it just because dm-kcopyd supports async copy? If yes, I believe we > > can reply on io_uring for implementing async copy_file_range, which will > > be generic interface for async copy, and could get better perf. > > > > It supports both sync and async. But used only inside dm-layer. > Async version of copy_file_range can help, using io-uring can be helpful > for user , but in-kernel users can't use uring. > > > > 2. copy_file_range, for block device atleast we saw few check's which fail > > > it for raw block device. At this point I dont know much about the history > > > of > > > why such check is present. > > > > Got it, but IMO the check in generic_copy_file_checks() can be > > relaxed to cover blkdev cause splice does support blkdev. > > > > Then your bdev offload copy work can be simplified into: > > > > 1) implement .copy_file_range for def_blk_fops, suppose it is > > blkdev_copy_file_range() > > > > 2) inside blkdev_copy_file_range() > > > > - if the bdev supports offload copy, just submit one bio to the device, > > and this will be converted to one pt req to device > > > > - otherwise, fallback to generic_copy_file_range() > > > Actually we sent initial version with single bio, but later community suggested two bio's is must for offload, main reasoning being dm-layer,Xcopy,copy across namespace compatibilty. > We will check the feasibilty and try to implement the scheme in next versions. > It would be helpful, if someone in community know's why such checks were > present ? We see copy_file_range accepts only regular file. Was it > designed only for regular files or can we extend it to regular block > device. > As you suggested we were able to integrate def_blk_ops and run with user application, but we see one main issue with this approach. Using blkdev_copy_file_range requires having 2 file descriptors, which is not possible for in kernel users such as fabrics/dm-kcopyd which has only bdev descriptors. Do you have any plumbing suggestions here ? > > > > > > > When I was researching pipe/splice code for supporting ublk zero > > > > copy[1], I > > > > have got idea for async copy_file_range(), such as: io uring based > > > > direct splice, user backed intermediate buffer, still zero copy, if > > > > these > > > > ideas are finally implemented, we could get super-fast generic offload > > > > copy, > > > > and bdev copy is really covered too. > > > > > > > > [1] > > > > https://lore.kernel.org/linux-block/20221103085004.1029763-1-ming@redhat.com/ > > > > > > > > > > Seems interesting, We will take a look into this. > > > > BTW, that is probably one direction of ublk's async zero copy IO too. > > > > > > Thanks, > > Ming > > > > > > > Thanks, > Nitesh Thanks, Nitesh Shetty -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v5 02/10] block: Add copy offload support infrastructure
On Wed, Dec 07, 2022 at 11:24:00AM +0530, Nitesh Shetty wrote: > On Tue, Nov 29, 2022 at 05:14:28PM +0530, Nitesh Shetty wrote: > > On Thu, Nov 24, 2022 at 08:03:56AM +0800, Ming Lei wrote: > > > On Wed, Nov 23, 2022 at 03:37:12PM +0530, Nitesh Shetty wrote: > > > > On Wed, Nov 23, 2022 at 04:04:18PM +0800, Ming Lei wrote: > > > > > On Wed, Nov 23, 2022 at 11:28:19AM +0530, Nitesh Shetty wrote: > > > > > > Introduce blkdev_issue_copy which supports source and destination > > > > > > bdevs, > > > > > > and an array of (source, destination and copy length) tuples. > > > > > > Introduce REQ_COPY copy offload operation flag. Create a read-write > > > > > > bio pair with a token as payload and submitted to the device in > > > > > > order. > > > > > > Read request populates token with source specific information which > > > > > > is then passed with write request. > > > > > > This design is courtesy Mikulas Patocka's token based copy > > > > > > > > > > I thought this patchset is just for enabling copy command which is > > > > > supported by hardware. But turns out it isn't, because > > > > > blk_copy_offload() > > > > > still submits read/write bios for doing the copy. > > > > > > > > > > I am just wondering why not let copy_file_range() cover this kind of > > > > > copy, > > > > > and the framework has been there. > > > > > > > > > > > > > Main goal was to enable copy command, but community suggested to add > > > > copy emulation as well. > > > > > > > > blk_copy_offload - actually issues copy command in driver layer. > > > > The way read/write BIOs are percieved is different for copy offload. > > > > In copy offload we check REQ_COPY flag in NVMe driver layer to issue > > > > copy command. But we did missed it to add in other driver's, where they > > > > might be treated as normal READ/WRITE. > > > > > > > > blk_copy_emulate - is used if we fail or if device doesn't support > > > > native > > > > copy offload command. Here we do READ/WRITE. Using copy_file_range for > > > > emulation might be possible, but we see 2 issues here. > > > > 1. We explored possibility of pulling dm-kcopyd to block layer so that > > > > we > > > > can readily use it. But we found it had many dependecies from dm-layer. > > > > So later dropped that idea. > > > > > > Is it just because dm-kcopyd supports async copy? If yes, I believe we > > > can reply on io_uring for implementing async copy_file_range, which will > > > be generic interface for async copy, and could get better perf. > > > > > > > It supports both sync and async. But used only inside dm-layer. > > Async version of copy_file_range can help, using io-uring can be helpful > > for user , but in-kernel users can't use uring. > > > > > > 2. copy_file_range, for block device atleast we saw few check's which > > > > fail > > > > it for raw block device. At this point I dont know much about the > > > > history of > > > > why such check is present. > > > > > > Got it, but IMO the check in generic_copy_file_checks() can be > > > relaxed to cover blkdev cause splice does support blkdev. > > > > > > Then your bdev offload copy work can be simplified into: > > > > > > 1) implement .copy_file_range for def_blk_fops, suppose it is > > > blkdev_copy_file_range() > > > > > > 2) inside blkdev_copy_file_range() > > > > > > - if the bdev supports offload copy, just submit one bio to the device, > > > and this will be converted to one pt req to device > > > > > > - otherwise, fallback to generic_copy_file_range() > > > > > > > Actually we sent initial version with single bio, but later community > suggested two bio's is must for offload, main reasoning being Is there any link which holds the discussion? > dm-layer,Xcopy,copy across namespace compatibilty. But dm kcopy has supported bdev copy already, so once your patch is ready, dm kcopy can just sends one bio with REQ_COPY if the device supports offload command, otherwise the current dm kcopy code can work as before. > > > We will check the feasibilty and try to implement the scheme in next > > versions. > > It would be helpful, if someone in community know's why such checks were > > present ? We see copy_file_range accepts only regular file. Was it > > designed only for regular files or can we extend it to regular block > > device. > > > > As you suggested we were able to integrate def_blk_ops and > run with user application, but we see one main issue with this approach. > Using blkdev_copy_file_range requires having 2 file descriptors, which > is not possible for in kernel users such as fabrics/dm-kcopyd which has > only bdev descriptors. > Do you have any plumbing suggestions here ? What is the fabrics kernel user? Any kernel target code(such as nvme target) has file/bdev path available, vfs_copy_file_range() should be fine. Also IMO, kernel copy user shouldn't be important long term, especially if io_uring copy_file_range() can be supported, forwarding to userspace not only gets better performance,
Re: [dm-devel] [PATCH v5 02/10] block: Add copy offload support infrastructure
On Thu, Nov 24, 2022 at 08:03:56AM +0800, Ming Lei wrote: > On Wed, Nov 23, 2022 at 03:37:12PM +0530, Nitesh Shetty wrote: > > On Wed, Nov 23, 2022 at 04:04:18PM +0800, Ming Lei wrote: > > > On Wed, Nov 23, 2022 at 11:28:19AM +0530, Nitesh Shetty wrote: > > > > Introduce blkdev_issue_copy which supports source and destination bdevs, > > > > and an array of (source, destination and copy length) tuples. > > > > Introduce REQ_COPY copy offload operation flag. Create a read-write > > > > bio pair with a token as payload and submitted to the device in order. > > > > Read request populates token with source specific information which > > > > is then passed with write request. > > > > This design is courtesy Mikulas Patocka's token based copy > > > > > > I thought this patchset is just for enabling copy command which is > > > supported by hardware. But turns out it isn't, because blk_copy_offload() > > > still submits read/write bios for doing the copy. > > > > > > I am just wondering why not let copy_file_range() cover this kind of copy, > > > and the framework has been there. > > > > > > > Main goal was to enable copy command, but community suggested to add > > copy emulation as well. > > > > blk_copy_offload - actually issues copy command in driver layer. > > The way read/write BIOs are percieved is different for copy offload. > > In copy offload we check REQ_COPY flag in NVMe driver layer to issue > > copy command. But we did missed it to add in other driver's, where they > > might be treated as normal READ/WRITE. > > > > blk_copy_emulate - is used if we fail or if device doesn't support native > > copy offload command. Here we do READ/WRITE. Using copy_file_range for > > emulation might be possible, but we see 2 issues here. > > 1. We explored possibility of pulling dm-kcopyd to block layer so that we > > can readily use it. But we found it had many dependecies from dm-layer. > > So later dropped that idea. > > Is it just because dm-kcopyd supports async copy? If yes, I believe we > can reply on io_uring for implementing async copy_file_range, which will > be generic interface for async copy, and could get better perf. > It supports both sync and async. But used only inside dm-layer. Async version of copy_file_range can help, using io-uring can be helpful for user , but in-kernel users can't use uring. > > 2. copy_file_range, for block device atleast we saw few check's which fail > > it for raw block device. At this point I dont know much about the history of > > why such check is present. > > Got it, but IMO the check in generic_copy_file_checks() can be > relaxed to cover blkdev cause splice does support blkdev. > > Then your bdev offload copy work can be simplified into: > > 1) implement .copy_file_range for def_blk_fops, suppose it is > blkdev_copy_file_range() > > 2) inside blkdev_copy_file_range() > > - if the bdev supports offload copy, just submit one bio to the device, > and this will be converted to one pt req to device > > - otherwise, fallback to generic_copy_file_range() > We will check the feasibilty and try to implement the scheme in next versions. It would be helpful, if someone in community know's why such checks were present ? We see copy_file_range accepts only regular file. Was it designed only for regular files or can we extend it to regular block device. > > > > > When I was researching pipe/splice code for supporting ublk zero copy[1], > > > I > > > have got idea for async copy_file_range(), such as: io uring based > > > direct splice, user backed intermediate buffer, still zero copy, if these > > > ideas are finally implemented, we could get super-fast generic offload > > > copy, > > > and bdev copy is really covered too. > > > > > > [1] > > > https://lore.kernel.org/linux-block/20221103085004.1029763-1-ming@redhat.com/ > > > > > > > Seems interesting, We will take a look into this. > > BTW, that is probably one direction of ublk's async zero copy IO too. > > > Thanks, > Ming > > Thanks, Nitesh -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v5 02/10] block: Add copy offload support infrastructure
On Wed, Nov 23, 2022 at 03:37:12PM +0530, Nitesh Shetty wrote: > On Wed, Nov 23, 2022 at 04:04:18PM +0800, Ming Lei wrote: > > On Wed, Nov 23, 2022 at 11:28:19AM +0530, Nitesh Shetty wrote: > > > Introduce blkdev_issue_copy which supports source and destination bdevs, > > > and an array of (source, destination and copy length) tuples. > > > Introduce REQ_COPY copy offload operation flag. Create a read-write > > > bio pair with a token as payload and submitted to the device in order. > > > Read request populates token with source specific information which > > > is then passed with write request. > > > This design is courtesy Mikulas Patocka's token based copy > > > > I thought this patchset is just for enabling copy command which is > > supported by hardware. But turns out it isn't, because blk_copy_offload() > > still submits read/write bios for doing the copy. > > > > I am just wondering why not let copy_file_range() cover this kind of copy, > > and the framework has been there. > > > > Main goal was to enable copy command, but community suggested to add > copy emulation as well. > > blk_copy_offload - actually issues copy command in driver layer. > The way read/write BIOs are percieved is different for copy offload. > In copy offload we check REQ_COPY flag in NVMe driver layer to issue > copy command. But we did missed it to add in other driver's, where they > might be treated as normal READ/WRITE. > > blk_copy_emulate - is used if we fail or if device doesn't support native > copy offload command. Here we do READ/WRITE. Using copy_file_range for > emulation might be possible, but we see 2 issues here. > 1. We explored possibility of pulling dm-kcopyd to block layer so that we > can readily use it. But we found it had many dependecies from dm-layer. > So later dropped that idea. Is it just because dm-kcopyd supports async copy? If yes, I believe we can reply on io_uring for implementing async copy_file_range, which will be generic interface for async copy, and could get better perf. > 2. copy_file_range, for block device atleast we saw few check's which fail > it for raw block device. At this point I dont know much about the history of > why such check is present. Got it, but IMO the check in generic_copy_file_checks() can be relaxed to cover blkdev cause splice does support blkdev. Then your bdev offload copy work can be simplified into: 1) implement .copy_file_range for def_blk_fops, suppose it is blkdev_copy_file_range() 2) inside blkdev_copy_file_range() - if the bdev supports offload copy, just submit one bio to the device, and this will be converted to one pt req to device - otherwise, fallback to generic_copy_file_range() > > > When I was researching pipe/splice code for supporting ublk zero copy[1], I > > have got idea for async copy_file_range(), such as: io uring based > > direct splice, user backed intermediate buffer, still zero copy, if these > > ideas are finally implemented, we could get super-fast generic offload copy, > > and bdev copy is really covered too. > > > > [1] > > https://lore.kernel.org/linux-block/20221103085004.1029763-1-ming@redhat.com/ > > > > Seems interesting, We will take a look into this. BTW, that is probably one direction of ublk's async zero copy IO too. Thanks, Ming -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v5 02/10] block: Add copy offload support infrastructure
On Wed, Nov 23, 2022 at 04:04:18PM +0800, Ming Lei wrote: > On Wed, Nov 23, 2022 at 11:28:19AM +0530, Nitesh Shetty wrote: > > Introduce blkdev_issue_copy which supports source and destination bdevs, > > and an array of (source, destination and copy length) tuples. > > Introduce REQ_COPY copy offload operation flag. Create a read-write > > bio pair with a token as payload and submitted to the device in order. > > Read request populates token with source specific information which > > is then passed with write request. > > This design is courtesy Mikulas Patocka's token based copy > > I thought this patchset is just for enabling copy command which is > supported by hardware. But turns out it isn't, because blk_copy_offload() > still submits read/write bios for doing the copy. > > I am just wondering why not let copy_file_range() cover this kind of copy, > and the framework has been there. > Main goal was to enable copy command, but community suggested to add copy emulation as well. blk_copy_offload - actually issues copy command in driver layer. The way read/write BIOs are percieved is different for copy offload. In copy offload we check REQ_COPY flag in NVMe driver layer to issue copy command. But we did missed it to add in other driver's, where they might be treated as normal READ/WRITE. blk_copy_emulate - is used if we fail or if device doesn't support native copy offload command. Here we do READ/WRITE. Using copy_file_range for emulation might be possible, but we see 2 issues here. 1. We explored possibility of pulling dm-kcopyd to block layer so that we can readily use it. But we found it had many dependecies from dm-layer. So later dropped that idea. 2. copy_file_range, for block device atleast we saw few check's which fail it for raw block device. At this point I dont know much about the history of why such check is present. > When I was researching pipe/splice code for supporting ublk zero copy[1], I > have got idea for async copy_file_range(), such as: io uring based > direct splice, user backed intermediate buffer, still zero copy, if these > ideas are finally implemented, we could get super-fast generic offload copy, > and bdev copy is really covered too. > > [1] > https://lore.kernel.org/linux-block/20221103085004.1029763-1-ming@redhat.com/ > Seems interesting, We will take a look into this. > > Thanks, Nitesh -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v5 02/10] block: Add copy offload support infrastructure
On Wed, Nov 23, 2022 at 11:28:19AM +0530, Nitesh Shetty wrote: > Introduce blkdev_issue_copy which supports source and destination bdevs, > and an array of (source, destination and copy length) tuples. > Introduce REQ_COPY copy offload operation flag. Create a read-write > bio pair with a token as payload and submitted to the device in order. > Read request populates token with source specific information which > is then passed with write request. > This design is courtesy Mikulas Patocka's token based copy I thought this patchset is just for enabling copy command which is supported by hardware. But turns out it isn't, because blk_copy_offload() still submits read/write bios for doing the copy. I am just wondering why not let copy_file_range() cover this kind of copy, and the framework has been there. When I was researching pipe/splice code for supporting ublk zero copy[1], I have got idea for async copy_file_range(), such as: io uring based direct splice, user backed intermediate buffer, still zero copy, if these ideas are finally implemented, we could get super-fast generic offload copy, and bdev copy is really covered too. [1] https://lore.kernel.org/linux-block/20221103085004.1029763-1-ming@redhat.com/ thanks, Ming -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel