Re: [RFC PATCH v5 0/4] add simple copy support

2021-04-13 Thread Chaitanya Kulkarni
On 4/13/21 11:26, Javier González wrote:
>>> I believe there is space for extensions to simple copy. But given the
>>> experience with XCOPY, I can imagine that changes will be incremental,
>>> based on very specific use cases.
>>>
>>> I think getting support upstream and bringing deployed cases is a very
>>> good start.
>> Copying data (files) within the controller/subsystem from ns_A to ns_B 
>> using NVMf will reduce network BW and memory BW in the host server.
>>
>> This feature is well known and the use case is well known.
> Definitely.
>

I've a working code for nvmet for simple copy, I'm waiting to resolve
the host interface for REQ_OP_COPY so I can post it with this series.

Let me know if someone wants to collaborate offline on that.

IMHO we first need to sort out the host side interface which is
a challenge for years and it is not that easy to get it right
based on the history.




Re: [RFC PATCH v5 0/4] add simple copy support

2021-04-13 Thread Javier González

On 13.04.2021 18:38, Max Gurtovoy wrote:


On 4/11/2021 10:26 PM, Javier González wrote:

On 11.04.2021 12:10, Max Gurtovoy wrote:


On 4/10/2021 9:32 AM, Javier González wrote:
On 10 Apr 2021, at 02.30, Chaitanya Kulkarni 
 wrote:


On 4/9/21 17:22, Max Gurtovoy wrote:

On 2/19/2021 2:45 PM, SelvaKumar S wrote:
This patchset tries to add support for TP4065a ("Simple 
Copy Command"),

v2020.05.04 ("Ratified")

The Specification can be found in following link.
https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip


Simple copy command is a copy offloading operation and is  
used to copy
multiple contiguous ranges (source_ranges) of LBA's to a 
single destination

LBA within the device reducing traffic between host and device.

This implementation doesn't add native copy offload 
support for stacked

devices rather copy offload is done through emulation. Possible use
cases are F2FS gc and BTRFS relocation/balance.

*blkdev_issue_copy* takes source bdev, no of sources, 
array of source
ranges (in sectors), destination bdev and destination 
offset(in sectors).
If both source and destination block devices are same and 
copy_offload = 1,
then copy is done through native copy offloading. Copy 
emulation is used

in other cases.

As SCSI XCOPY can take two different block devices and no 
of source range is
equal to 1, this interface can be extended in future to 
support SCSI XCOPY.

Any idea why this TP wasn't designed for copy offload between 2
different namespaces in the same controller ?

Yes, it was the first attempt so to keep it simple.

Further work is needed to add incremental TP so that we can 
also do a copy
between the name-spaces of same controller (if we can't 
already) and to the

namespaces that belongs to the different controller.


And a simple copy will be the case where the src_nsid == dst_nsid ?

Also why there are multiple source ranges and only one dst range ? We
could add a bit to indicate if this range is src or dst..
One of the target use cases was ZNS in order to avoid fabric 
transfers during host GC. You can see how this plays well with 
several zone ranges and a single zone destination.


If we start getting support in Linux through the different past 
copy offload efforts, I’m sure we can extend this TP in the 
future.


But the "copy" command IMO is more general than the ZNS GC case, 
that can be a private case of copy, isn't it ?


It applies to any namespace type, so yes. I just wanted to give you the
background for the current "simple" scope through one of the use cases
that was in mind.

We can get a big benefit of offloading the data copy from one ns 
to another in the same controller and even in different 
controllers in the same subsystem.


Definitely.



Do you think the extension should be to "copy" command or to 
create a new command "x_copy" for copying to different destination 
ns ?


I believe there is space for extensions to simple copy. But given the
experience with XCOPY, I can imagine that changes will be incremental,
based on very specific use cases.

I think getting support upstream and bringing deployed cases is a very
good start.


Copying data (files) within the controller/subsystem from ns_A to ns_B 
using NVMf will reduce network BW and memory BW in the host server.


This feature is well known and the use case is well known.


Definitely.



The question whether we implement it in vendor specific manner of we 
add it to the specification.


I prefer adding it to the spec :)


Agree. Let's build up on top of Simple Copy. We can talk about it
offline in the context of the NVMe TWG.


Re: [RFC PATCH v5 0/4] add simple copy support

2021-04-13 Thread Max Gurtovoy



On 4/11/2021 10:26 PM, Javier González wrote:

On 11.04.2021 12:10, Max Gurtovoy wrote:


On 4/10/2021 9:32 AM, Javier González wrote:
On 10 Apr 2021, at 02.30, Chaitanya Kulkarni 
 wrote:


On 4/9/21 17:22, Max Gurtovoy wrote:

On 2/19/2021 2:45 PM, SelvaKumar S wrote:
This patchset tries to add support for TP4065a ("Simple Copy 
Command"),

v2020.05.04 ("Ratified")

The Specification can be found in following link.
https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip 



Simple copy command is a copy offloading operation and is  used 
to copy
multiple contiguous ranges (source_ranges) of LBA's to a single 
destination

LBA within the device reducing traffic between host and device.

This implementation doesn't add native copy offload support for 
stacked

devices rather copy offload is done through emulation. Possible use
cases are F2FS gc and BTRFS relocation/balance.

*blkdev_issue_copy* takes source bdev, no of sources, array of 
source
ranges (in sectors), destination bdev and destination offset(in 
sectors).
If both source and destination block devices are same and 
copy_offload = 1,
then copy is done through native copy offloading. Copy emulation 
is used

in other cases.

As SCSI XCOPY can take two different block devices and no of 
source range is
equal to 1, this interface can be extended in future to support 
SCSI XCOPY.

Any idea why this TP wasn't designed for copy offload between 2
different namespaces in the same controller ?

Yes, it was the first attempt so to keep it simple.

Further work is needed to add incremental TP so that we can also do 
a copy
between the name-spaces of same controller (if we can't already) 
and to the

namespaces that belongs to the different controller.


And a simple copy will be the case where the src_nsid == dst_nsid ?

Also why there are multiple source ranges and only one dst range ? We
could add a bit to indicate if this range is src or dst..
One of the target use cases was ZNS in order to avoid fabric 
transfers during host GC. You can see how this plays well with 
several zone ranges and a single zone destination.


If we start getting support in Linux through the different past copy 
offload efforts, I’m sure we can extend this TP in the future.


But the "copy" command IMO is more general than the ZNS GC case, that 
can be a private case of copy, isn't it ?


It applies to any namespace type, so yes. I just wanted to give you the
background for the current "simple" scope through one of the use cases
that was in mind.

We can get a big benefit of offloading the data copy from one ns to 
another in the same controller and even in different controllers in 
the same subsystem.


Definitely.



Do you think the extension should be to "copy" command or to create a 
new command "x_copy" for copying to different destination ns ?


I believe there is space for extensions to simple copy. But given the
experience with XCOPY, I can imagine that changes will be incremental,
based on very specific use cases.

I think getting support upstream and bringing deployed cases is a very
good start.


Copying data (files) within the controller/subsystem from ns_A to ns_B 
using NVMf will reduce network BW and memory BW in the host server.


This feature is well known and the use case is well known.

The question whether we implement it in vendor specific manner of we add 
it to the specification.


I prefer adding it to the spec :)




Re: [RFC PATCH v5 0/4] add simple copy support

2021-04-11 Thread Javier González

On 11.04.2021 12:10, Max Gurtovoy wrote:


On 4/10/2021 9:32 AM, Javier González wrote:

On 10 Apr 2021, at 02.30, Chaitanya Kulkarni  wrote:

On 4/9/21 17:22, Max Gurtovoy wrote:

On 2/19/2021 2:45 PM, SelvaKumar S wrote:
This patchset tries to add support for TP4065a ("Simple Copy Command"),
v2020.05.04 ("Ratified")

The Specification can be found in following link.
https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip

Simple copy command is a copy offloading operation and is  used to copy
multiple contiguous ranges (source_ranges) of LBA's to a single destination
LBA within the device reducing traffic between host and device.

This implementation doesn't add native copy offload support for stacked
devices rather copy offload is done through emulation. Possible use
cases are F2FS gc and BTRFS relocation/balance.

*blkdev_issue_copy* takes source bdev, no of sources, array of source
ranges (in sectors), destination bdev and destination offset(in sectors).
If both source and destination block devices are same and copy_offload = 1,
then copy is done through native copy offloading. Copy emulation is used
in other cases.

As SCSI XCOPY can take two different block devices and no of source range is
equal to 1, this interface can be extended in future to support SCSI XCOPY.

Any idea why this TP wasn't designed for copy offload between 2
different namespaces in the same controller ?

Yes, it was the first attempt so to keep it simple.

Further work is needed to add incremental TP so that we can also do a copy
between the name-spaces of same controller (if we can't already) and to the
namespaces that belongs to the different controller.


And a simple copy will be the case where the src_nsid == dst_nsid ?

Also why there are multiple source ranges and only one dst range ? We
could add a bit to indicate if this range is src or dst..

One of the target use cases was ZNS in order to avoid fabric transfers during 
host GC. You can see how this plays well with several zone ranges and a single 
zone destination.

If we start getting support in Linux through the different past copy offload 
efforts, I’m sure we can extend this TP in the future.


But the "copy" command IMO is more general than the ZNS GC case, that 
can be a private case of copy, isn't it ?


It applies to any namespace type, so yes. I just wanted to give you the
background for the current "simple" scope through one of the use cases
that was in mind.

We can get a big benefit of offloading the data copy from one ns to 
another in the same controller and even in different controllers in 
the same subsystem.


Definitely.



Do you think the extension should be to "copy" command or to create a 
new command "x_copy" for copying to different destination ns ?


I believe there is space for extensions to simple copy. But given the
experience with XCOPY, I can imagine that changes will be incremental,
based on very specific use cases.

I think getting support upstream and bringing deployed cases is a very
good start.


Re: [RFC PATCH v5 0/4] add simple copy support

2021-04-11 Thread Max Gurtovoy



On 4/10/2021 9:32 AM, Javier González wrote:

On 10 Apr 2021, at 02.30, Chaitanya Kulkarni  wrote:

On 4/9/21 17:22, Max Gurtovoy wrote:

On 2/19/2021 2:45 PM, SelvaKumar S wrote:
This patchset tries to add support for TP4065a ("Simple Copy Command"),
v2020.05.04 ("Ratified")

The Specification can be found in following link.
https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip

Simple copy command is a copy offloading operation and is  used to copy
multiple contiguous ranges (source_ranges) of LBA's to a single destination
LBA within the device reducing traffic between host and device.

This implementation doesn't add native copy offload support for stacked
devices rather copy offload is done through emulation. Possible use
cases are F2FS gc and BTRFS relocation/balance.

*blkdev_issue_copy* takes source bdev, no of sources, array of source
ranges (in sectors), destination bdev and destination offset(in sectors).
If both source and destination block devices are same and copy_offload = 1,
then copy is done through native copy offloading. Copy emulation is used
in other cases.

As SCSI XCOPY can take two different block devices and no of source range is
equal to 1, this interface can be extended in future to support SCSI XCOPY.

Any idea why this TP wasn't designed for copy offload between 2
different namespaces in the same controller ?

Yes, it was the first attempt so to keep it simple.

Further work is needed to add incremental TP so that we can also do a copy
between the name-spaces of same controller (if we can't already) and to the
namespaces that belongs to the different controller.


And a simple copy will be the case where the src_nsid == dst_nsid ?

Also why there are multiple source ranges and only one dst range ? We
could add a bit to indicate if this range is src or dst..

One of the target use cases was ZNS in order to avoid fabric transfers during 
host GC. You can see how this plays well with several zone ranges and a single 
zone destination.

If we start getting support in Linux through the different past copy offload 
efforts, I’m sure we can extend this TP in the future.


But the "copy" command IMO is more general than the ZNS GC case, that 
can be a private case of copy, isn't it ?


We can get a big benefit of offloading the data copy from one ns to 
another in the same controller and even in different controllers in the 
same subsystem.


Do you think the extension should be to "copy" command or to create a 
new command "x_copy" for copying to different destination ns ?



  


Re: [RFC PATCH v5 0/4] add simple copy support

2021-04-10 Thread Javier González


> On 10 Apr 2021, at 02.30, Chaitanya Kulkarni  
> wrote:
> 
> On 4/9/21 17:22, Max Gurtovoy wrote:
>>> On 2/19/2021 2:45 PM, SelvaKumar S wrote:
>>> This patchset tries to add support for TP4065a ("Simple Copy Command"),
>>> v2020.05.04 ("Ratified")
>>> 
>>> The Specification can be found in following link.
>>> https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip
>>> 
>>> Simple copy command is a copy offloading operation and is  used to copy
>>> multiple contiguous ranges (source_ranges) of LBA's to a single destination
>>> LBA within the device reducing traffic between host and device.
>>> 
>>> This implementation doesn't add native copy offload support for stacked
>>> devices rather copy offload is done through emulation. Possible use
>>> cases are F2FS gc and BTRFS relocation/balance.
>>> 
>>> *blkdev_issue_copy* takes source bdev, no of sources, array of source
>>> ranges (in sectors), destination bdev and destination offset(in sectors).
>>> If both source and destination block devices are same and copy_offload = 1,
>>> then copy is done through native copy offloading. Copy emulation is used
>>> in other cases.
>>> 
>>> As SCSI XCOPY can take two different block devices and no of source range is
>>> equal to 1, this interface can be extended in future to support SCSI XCOPY.
>> Any idea why this TP wasn't designed for copy offload between 2 
>> different namespaces in the same controller ?
> 
> Yes, it was the first attempt so to keep it simple.
> 
> Further work is needed to add incremental TP so that we can also do a copy
> between the name-spaces of same controller (if we can't already) and to the
> namespaces that belongs to the different controller.
> 
>> And a simple copy will be the case where the src_nsid == dst_nsid ?
>> 
>> Also why there are multiple source ranges and only one dst range ? We 
>> could add a bit to indicate if this range is src or dst..

One of the target use cases was ZNS in order to avoid fabric transfers during 
host GC. You can see how this plays well with several zone ranges and a single 
zone destination. 

If we start getting support in Linux through the different past copy offload 
efforts, I’m sure we can extend this TP in the future. 


Re: [RFC PATCH v5 0/4] add simple copy support

2021-04-09 Thread Chaitanya Kulkarni
On 4/9/21 17:22, Max Gurtovoy wrote:
> On 2/19/2021 2:45 PM, SelvaKumar S wrote:
>> This patchset tries to add support for TP4065a ("Simple Copy Command"),
>> v2020.05.04 ("Ratified")
>>
>> The Specification can be found in following link.
>> https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip
>>
>> Simple copy command is a copy offloading operation and is  used to copy
>> multiple contiguous ranges (source_ranges) of LBA's to a single destination
>> LBA within the device reducing traffic between host and device.
>>
>> This implementation doesn't add native copy offload support for stacked
>> devices rather copy offload is done through emulation. Possible use
>> cases are F2FS gc and BTRFS relocation/balance.
>>
>> *blkdev_issue_copy* takes source bdev, no of sources, array of source
>> ranges (in sectors), destination bdev and destination offset(in sectors).
>> If both source and destination block devices are same and copy_offload = 1,
>> then copy is done through native copy offloading. Copy emulation is used
>> in other cases.
>>
>> As SCSI XCOPY can take two different block devices and no of source range is
>> equal to 1, this interface can be extended in future to support SCSI XCOPY.
> Any idea why this TP wasn't designed for copy offload between 2 
> different namespaces in the same controller ?

Yes, it was the first attempt so to keep it simple.

Further work is needed to add incremental TP so that we can also do a copy
between the name-spaces of same controller (if we can't already) and to the
namespaces that belongs to the different controller.

> And a simple copy will be the case where the src_nsid == dst_nsid ?
>
> Also why there are multiple source ranges and only one dst range ? We 
> could add a bit to indicate if this range is src or dst..
>
>





Re: [RFC PATCH v5 0/4] add simple copy support

2021-04-09 Thread Max Gurtovoy



On 2/19/2021 2:45 PM, SelvaKumar S wrote:

This patchset tries to add support for TP4065a ("Simple Copy Command"),
v2020.05.04 ("Ratified")

The Specification can be found in following link.
https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip

Simple copy command is a copy offloading operation and is  used to copy
multiple contiguous ranges (source_ranges) of LBA's to a single destination
LBA within the device reducing traffic between host and device.

This implementation doesn't add native copy offload support for stacked
devices rather copy offload is done through emulation. Possible use
cases are F2FS gc and BTRFS relocation/balance.

*blkdev_issue_copy* takes source bdev, no of sources, array of source
ranges (in sectors), destination bdev and destination offset(in sectors).
If both source and destination block devices are same and copy_offload = 1,
then copy is done through native copy offloading. Copy emulation is used
in other cases.

As SCSI XCOPY can take two different block devices and no of source range is
equal to 1, this interface can be extended in future to support SCSI XCOPY.


Any idea why this TP wasn't designed for copy offload between 2 
different namespaces in the same controller ?


And a simple copy will be the case where the src_nsid == dst_nsid ?

Also why there are multiple source ranges and only one dst range ? We 
could add a bit to indicate if this range is src or dst..





For devices supporting native simple copy, attach the control information
as payload to the bio and submit to the device. For devices without native
copy support, copy emulation is done by reading each source range into memory
and writing it to the destination. Caller can choose not to try
emulation if copy offload is not supported by setting
BLKDEV_COPY_NOEMULATION flag.

Following limits are added to queue limits and are exposed in sysfs
to userspace
- *copy_offload* controls copy_offload. set 0 to disable copy
offload, 1 to enable native copy offloading support.
- *max_copy_sectors* limits the sum of all source_range length
- *max_copy_nr_ranges* limits the number of source ranges
- *max_copy_range_sectors* limit the maximum number of sectors
that can constitute a single source range.

max_copy_sectors = 0 indicates the device doesn't support copy
offloading.

*copy offload* sysfs entry is configurable and can be used toggle
between emulation and native support depending upon the usecase.

Changes from v4

1. Extend dm-kcopyd to leverage copy-offload, while copying within the
same device. The other approach was to have copy-emulation by moving
dm-kcopyd to block layer. But it also required moving core dm-io infra,
causing a massive churn across multiple dm-targets.

2. Remove export in bio_map_kern()
3. Change copy_offload sysfs to accept 0 or else
4. Rename copy support flag to QUEUE_FLAG_SIMPLE_COPY
5. Rename payload entries, add source bdev field to be used while
partition remapping, remove copy_size
6. Change the blkdev_issue_copy() interface to accept destination and
source values in sector rather in bytes
7. Add payload to bio using bio_map_kern() for copy_offload case
8. Add check to return error if one of the source range length is 0
9. Add BLKDEV_COPY_NOEMULATION flag to allow user to not try copy
emulation incase of copy offload is not supported. Caller can his use
his existing copying logic to complete the io.
10. Bug fix copy checks and reduce size of rcu_lock()

Planned for next:
- adding blktests
- handling larger (than device limits) copy
- decide on ioctl interface (man-page etc.)

Changes from v3

1. gfp_flag fixes.
2. Export bio_map_kern() and use it to allocate and add pages to bio.
3. Move copy offload, reading to buf, writing from buf to separate functions.
4. Send read bio of copy offload by chaining them and submit asynchronously.
5. Add gendisk->part0 and part->bd_start_sect changes to blk_check_copy().
6. Move single source range limit check to blk_check_copy()
7. Rename __blkdev_issue_copy() to blkdev_issue_copy and remove old helper.
8. Change blkdev_issue_copy() interface generic to accepts destination bdev
to support XCOPY as well.
9. Add invalidate_kernel_vmap_range() after reading data for vmalloc'ed memory.
10. Fix buf allocoation logic to allocate buffer for the total size of copy.
11. Reword patch commit description.

Changes from v2

1. Add emulation support for devices not supporting copy.
2. Add *copy_offload* sysfs entry to enable and disable copy_offload
in devices supporting simple copy.
3. Remove simple copy support for stacked devices.

Changes from v1:

1. Fix memory leak in __blkdev_issue_copy
2. Unmark blk_check_copy inline
3. Fix line break in blk_check_copy_eod
4. Remove p checks and made code more readable
5. Don't use bio_set_op_attrs and remove op and set
bi_opf directly
6. Use struct_size to calculate total_size
7. Fix 

Re: [RFC PATCH v5 0/4] add simple copy support

2021-02-23 Thread Selva Jove
Dave,

copy_file_range() is work under progress.  FALLOC_FL_UNSHARE of fallocate()
use case sounds interesting. I will try to address both of them in the
next series.

Adding SCSI_XCOPY() support is not in the scope of this patchset. However
blkdev_issue_copy() interface is made generic so that it is possible to extend
to cross device XCOPY in future.


Thanks,
Selva


Re: [RFC PATCH v5 0/4] add simple copy support

2021-02-23 Thread Selva Jove
Thanks Su Yue. I'll update the link in the next series.

On Mon, Feb 22, 2021 at 12:23 PM Su Yue  wrote:
>
>
> On Fri 19 Feb 2021 at 20:45, SelvaKumar S
>  wrote:
>
> > This patchset tries to add support for TP4065a ("Simple Copy
> > Command"),
> > v2020.05.04 ("Ratified")
> >
> > The Specification can be found in following link.
> > https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip
> >
>
> 404 not found.
> Should it be
> https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs.zip
> ?
>
> > Simple copy command is a copy offloading operation and is  used
> > to copy
> > multiple contiguous ranges (source_ranges) of LBA's to a single
> > destination
> > LBA within the device reducing traffic between host and device.
> >
> > This implementation doesn't add native copy offload support for
> > stacked
> > devices rather copy offload is done through emulation. Possible
> > use
> > cases are F2FS gc and BTRFS relocation/balance.
> >
> > *blkdev_issue_copy* takes source bdev, no of sources, array of
> > source
> > ranges (in sectors), destination bdev and destination offset(in
> > sectors).
> > If both source and destination block devices are same and
> > copy_offload = 1,
> > then copy is done through native copy offloading. Copy emulation
> > is used
> > in other cases.
> >
> > As SCSI XCOPY can take two different block devices and no of
> > source range is
> > equal to 1, this interface can be extended in future to support
> > SCSI XCOPY.
> >
> > For devices supporting native simple copy, attach the control
> > information
> > as payload to the bio and submit to the device. For devices
> > without native
> > copy support, copy emulation is done by reading each source
> > range into memory
> > and writing it to the destination. Caller can choose not to try
> > emulation if copy offload is not supported by setting
> > BLKDEV_COPY_NOEMULATION flag.
> >
> > Following limits are added to queue limits and are exposed in
> > sysfs
> > to userspace
> >   - *copy_offload* controls copy_offload. set 0 to disable copy
> >   offload, 1 to enable native copy offloading support.
> >   - *max_copy_sectors* limits the sum of all source_range length
> >   - *max_copy_nr_ranges* limits the number of source ranges
> >   - *max_copy_range_sectors* limit the maximum number of sectors
> >   that can constitute a single source range.
> >
> >   max_copy_sectors = 0 indicates the device doesn't support copy
> > offloading.
> >
> >   *copy offload* sysfs entry is configurable and can be used
> > toggle
> > between emulation and native support depending upon the usecase.
> >
> > Changes from v4
> >
> > 1. Extend dm-kcopyd to leverage copy-offload, while copying
> > within the
> > same device. The other approach was to have copy-emulation by
> > moving
> > dm-kcopyd to block layer. But it also required moving core dm-io
> > infra,
> > causing a massive churn across multiple dm-targets.
> >
> > 2. Remove export in bio_map_kern()
> > 3. Change copy_offload sysfs to accept 0 or else
> > 4. Rename copy support flag to QUEUE_FLAG_SIMPLE_COPY
> > 5. Rename payload entries, add source bdev field to be used
> > while
> > partition remapping, remove copy_size
> > 6. Change the blkdev_issue_copy() interface to accept
> > destination and
> > source values in sector rather in bytes
> > 7. Add payload to bio using bio_map_kern() for copy_offload case
> > 8. Add check to return error if one of the source range length
> > is 0
> > 9. Add BLKDEV_COPY_NOEMULATION flag to allow user to not try
> > copy
> > emulation incase of copy offload is not supported. Caller can
> > his use
> > his existing copying logic to complete the io.
> > 10. Bug fix copy checks and reduce size of rcu_lock()
> >
> > Planned for next:
> > - adding blktests
> > - handling larger (than device limits) copy
> > - decide on ioctl interface (man-page etc.)
> >
> > Changes from v3
> >
> > 1. gfp_flag fixes.
> > 2. Export bio_map_kern() and use it to allocate and add pages to
> > bio.
> > 3. Move copy offload, reading to buf, writing from buf to
> > separate functions.
> > 4. Send read bio of copy offload by chaining them and submit
> > asynchronously.
> > 5. Add gendisk->part0 and part->bd_start_sect changes to
> > blk_check_copy().
> > 6. Move single source range limit check to blk_check_copy()
> > 7. Rename __blkdev_issue_copy() to blkdev_issue_copy and remove
> > old helper.
> > 8. Change blkdev_issue_copy() interface generic to accepts
> > destination bdev
> >   to support XCOPY as well.
> > 9. Add invalidate_kernel_vmap_range() after reading data for
> > vmalloc'ed memory.
> > 10. Fix buf allocoation logic to allocate buffer for the total
> > size of copy.
> > 11. Reword patch commit description.
> >
> > Changes from v2
> >
> > 1. Add emulation support for devices not supporting copy.
> > 2. Add *copy_offload* sysfs entry to enable and disable
> > copy_offload
> >   in devices 

Re: [RFC PATCH v5 0/4] add simple copy support

2021-02-21 Thread Su Yue



On Fri 19 Feb 2021 at 20:45, SelvaKumar S 
 wrote:


This patchset tries to add support for TP4065a ("Simple Copy 
Command"),

v2020.05.04 ("Ratified")

The Specification can be found in following link.
https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip



404 not found.
Should it be
https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs.zip
?

Simple copy command is a copy offloading operation and is  used 
to copy
multiple contiguous ranges (source_ranges) of LBA's to a single 
destination

LBA within the device reducing traffic between host and device.

This implementation doesn't add native copy offload support for 
stacked
devices rather copy offload is done through emulation. Possible 
use

cases are F2FS gc and BTRFS relocation/balance.

*blkdev_issue_copy* takes source bdev, no of sources, array of 
source
ranges (in sectors), destination bdev and destination offset(in 
sectors).
If both source and destination block devices are same and 
copy_offload = 1,
then copy is done through native copy offloading. Copy emulation 
is used

in other cases.

As SCSI XCOPY can take two different block devices and no of 
source range is
equal to 1, this interface can be extended in future to support 
SCSI XCOPY.


For devices supporting native simple copy, attach the control 
information
as payload to the bio and submit to the device. For devices 
without native
copy support, copy emulation is done by reading each source 
range into memory

and writing it to the destination. Caller can choose not to try
emulation if copy offload is not supported by setting
BLKDEV_COPY_NOEMULATION flag.

Following limits are added to queue limits and are exposed in 
sysfs

to userspace
- *copy_offload* controls copy_offload. set 0 to disable copy
offload, 1 to enable native copy offloading support.
- *max_copy_sectors* limits the sum of all source_range length
- *max_copy_nr_ranges* limits the number of source ranges
- *max_copy_range_sectors* limit the maximum number of sectors
that can constitute a single source range.

max_copy_sectors = 0 indicates the device doesn't support copy
offloading.

	*copy offload* sysfs entry is configurable and can be used 
toggle

between emulation and native support depending upon the usecase.

Changes from v4

1. Extend dm-kcopyd to leverage copy-offload, while copying 
within the
same device. The other approach was to have copy-emulation by 
moving
dm-kcopyd to block layer. But it also required moving core dm-io 
infra,

causing a massive churn across multiple dm-targets.

2. Remove export in bio_map_kern()
3. Change copy_offload sysfs to accept 0 or else
4. Rename copy support flag to QUEUE_FLAG_SIMPLE_COPY
5. Rename payload entries, add source bdev field to be used 
while

partition remapping, remove copy_size
6. Change the blkdev_issue_copy() interface to accept 
destination and

source values in sector rather in bytes
7. Add payload to bio using bio_map_kern() for copy_offload case
8. Add check to return error if one of the source range length 
is 0
9. Add BLKDEV_COPY_NOEMULATION flag to allow user to not try 
copy
emulation incase of copy offload is not supported. Caller can 
his use

his existing copying logic to complete the io.
10. Bug fix copy checks and reduce size of rcu_lock()

Planned for next:
- adding blktests
- handling larger (than device limits) copy
- decide on ioctl interface (man-page etc.)

Changes from v3

1. gfp_flag fixes.
2. Export bio_map_kern() and use it to allocate and add pages to 
bio.
3. Move copy offload, reading to buf, writing from buf to 
separate functions.
4. Send read bio of copy offload by chaining them and submit 
asynchronously.
5. Add gendisk->part0 and part->bd_start_sect changes to 
blk_check_copy().

6. Move single source range limit check to blk_check_copy()
7. Rename __blkdev_issue_copy() to blkdev_issue_copy and remove 
old helper.
8. Change blkdev_issue_copy() interface generic to accepts 
destination bdev

to support XCOPY as well.
9. Add invalidate_kernel_vmap_range() after reading data for 
vmalloc'ed memory.
10. Fix buf allocoation logic to allocate buffer for the total 
size of copy.

11. Reword patch commit description.

Changes from v2

1. Add emulation support for devices not supporting copy.
2. Add *copy_offload* sysfs entry to enable and disable 
copy_offload

in devices supporting simple copy.
3. Remove simple copy support for stacked devices.

Changes from v1:

1. Fix memory leak in __blkdev_issue_copy
2. Unmark blk_check_copy inline
3. Fix line break in blk_check_copy_eod
4. Remove p checks and made code more readable
5. Don't use bio_set_op_attrs and remove op and set
   bi_opf directly
6. Use struct_size to calculate total_size
7. Fix partition remap of copy destination
8. Remove mcl,mssrl,msrc from nvme_ns
9. Initialize copy queue limits to 0 in nvme_config_copy
10. Remove return in 

Re: [RFC PATCH v5 0/4] add simple copy support

2021-02-21 Thread Ming Lei
On Fri, Feb 19, 2021 at 06:15:13PM +0530, SelvaKumar S wrote:
> This patchset tries to add support for TP4065a ("Simple Copy Command"),
> v2020.05.04 ("Ratified")
> 
> The Specification can be found in following link.
> https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip
> 
> Simple copy command is a copy offloading operation and is  used to copy
> multiple contiguous ranges (source_ranges) of LBA's to a single destination
> LBA within the device reducing traffic between host and device.
> 
> This implementation doesn't add native copy offload support for stacked
> devices rather copy offload is done through emulation. Possible use
> cases are F2FS gc and BTRFS relocation/balance.
> 
> *blkdev_issue_copy* takes source bdev, no of sources, array of source
> ranges (in sectors), destination bdev and destination offset(in sectors).
> If both source and destination block devices are same and copy_offload = 1,
> then copy is done through native copy offloading. Copy emulation is used
> in other cases.
> 
> As SCSI XCOPY can take two different block devices and no of source range is
> equal to 1, this interface can be extended in future to support SCSI XCOPY.

The patchset adds ioctl(BLKCOPY) and two userspace visible data
struture(range_entry, and copy_range), all belong to kabi stuff, and the
interface is generic block layer kabi.

The API has to be allowed to extend for supporting SCSI XCOPY in future or 
similar
block copy commands without breaking previous application, so please CC 
linux-scsi
and scsi guys in your next post.


-- 
Ming



Re: [RFC PATCH v5 0/4] add simple copy support

2021-02-21 Thread Dave Chinner
On Fri, Feb 19, 2021 at 06:15:13PM +0530, SelvaKumar S wrote:
> This patchset tries to add support for TP4065a ("Simple Copy Command"),
> v2020.05.04 ("Ratified")
> 
> The Specification can be found in following link.
> https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip
> 
> Simple copy command is a copy offloading operation and is  used to copy
> multiple contiguous ranges (source_ranges) of LBA's to a single destination
> LBA within the device reducing traffic between host and device.
> 
> This implementation doesn't add native copy offload support for stacked
> devices rather copy offload is done through emulation. Possible use
> cases are F2FS gc and BTRFS relocation/balance.

It sounds like you are missing the most obvious use case for this:
hooking up filesystem copy_file_range() implementations to allow
userspace to offload user data copies to hardware

Another fs level feature that could use this for hardware
acceleration fallocate(FALLOC_FL_UNSHARE).

These are probably going to be far easier to hook up than filesystem
GC algorithms, and there is also solid data integrity and stress
testing checking infrastructure for these operations via fstests.

> As SCSI XCOPY can take two different block devices and no of source range is
> equal to 1, this interface can be extended in future to support SCSI XCOPY.

That greatly complicates the implementation. do we even care at this
point about cross-device XCOPY at this point?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [RFC PATCH v5 0/4] add simple copy support

2021-02-20 Thread Keith Busch
On Sat, Feb 20, 2021 at 06:01:56PM +, David Laight wrote:
> From: SelvaKumar S
> > Sent: 19 February 2021 12:45
> > 
> > This patchset tries to add support for TP4065a ("Simple Copy Command"),
> > v2020.05.04 ("Ratified")
> > 
> > The Specification can be found in following link.
> > https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip
> > 
> > Simple copy command is a copy offloading operation and is  used to copy
> > multiple contiguous ranges (source_ranges) of LBA's to a single destination
> > LBA within the device reducing traffic between host and device.
> 
> Sounds to me like the real reason is that the copy just ends up changing
> some indirect block pointers rather than having to actually copy the data.

I guess an implementation could do that, but I think that's missing the
point of the command. The intention is to copy the data to a new
location on the media for host managed garbage collection. 


Re: [RFC PATCH v5 0/4] add simple copy support

2021-02-20 Thread Matthew Wilcox
On Sat, Feb 20, 2021 at 06:01:56PM +, David Laight wrote:
> From: SelvaKumar S
> > Sent: 19 February 2021 12:45
> > 
> > This patchset tries to add support for TP4065a ("Simple Copy Command"),
> > v2020.05.04 ("Ratified")
> > 
> > The Specification can be found in following link.
> > https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip
> > 
> > Simple copy command is a copy offloading operation and is  used to copy
> > multiple contiguous ranges (source_ranges) of LBA's to a single destination
> > LBA within the device reducing traffic between host and device.
> 
> Sounds to me like the real reason is that the copy just ends up changing
> some indirect block pointers rather than having to actually copy the data.

That would be incorrect, at least for firmware that I have knowledge of.
There are checksums which involve the logical block address of the data,
and you can't just rewrite the checksum on NAND, you have to write the
entire block.

Now, firmware doesn't have to implement their checksum like this,
but there are good reasons to do it this way (eg if the command gets
corrupted in transfer and you read the wrong block, it will fail the
checksum, preventing the drive from returning Somebody Else's Data).

So let's take these people at their word.  It is to reduce traffic
between drive and host.  And that is a good enough reason to do it.


RE: [RFC PATCH v5 0/4] add simple copy support

2021-02-20 Thread David Laight
From: SelvaKumar S
> Sent: 19 February 2021 12:45
> 
> This patchset tries to add support for TP4065a ("Simple Copy Command"),
> v2020.05.04 ("Ratified")
> 
> The Specification can be found in following link.
> https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip
> 
> Simple copy command is a copy offloading operation and is  used to copy
> multiple contiguous ranges (source_ranges) of LBA's to a single destination
> LBA within the device reducing traffic between host and device.

Sounds to me like the real reason is that the copy just ends up changing
some indirect block pointers rather than having to actually copy the data.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


[RFC PATCH v5 0/4] add simple copy support

2021-02-19 Thread SelvaKumar S
This patchset tries to add support for TP4065a ("Simple Copy Command"),
v2020.05.04 ("Ratified")

The Specification can be found in following link.
https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip

Simple copy command is a copy offloading operation and is  used to copy
multiple contiguous ranges (source_ranges) of LBA's to a single destination
LBA within the device reducing traffic between host and device.

This implementation doesn't add native copy offload support for stacked
devices rather copy offload is done through emulation. Possible use
cases are F2FS gc and BTRFS relocation/balance.

*blkdev_issue_copy* takes source bdev, no of sources, array of source
ranges (in sectors), destination bdev and destination offset(in sectors).
If both source and destination block devices are same and copy_offload = 1,
then copy is done through native copy offloading. Copy emulation is used
in other cases.

As SCSI XCOPY can take two different block devices and no of source range is
equal to 1, this interface can be extended in future to support SCSI XCOPY.

For devices supporting native simple copy, attach the control information
as payload to the bio and submit to the device. For devices without native
copy support, copy emulation is done by reading each source range into memory
and writing it to the destination. Caller can choose not to try
emulation if copy offload is not supported by setting
BLKDEV_COPY_NOEMULATION flag.

Following limits are added to queue limits and are exposed in sysfs
to userspace
- *copy_offload* controls copy_offload. set 0 to disable copy
offload, 1 to enable native copy offloading support.
- *max_copy_sectors* limits the sum of all source_range length
- *max_copy_nr_ranges* limits the number of source ranges
- *max_copy_range_sectors* limit the maximum number of sectors
that can constitute a single source range.

max_copy_sectors = 0 indicates the device doesn't support copy
offloading.

*copy offload* sysfs entry is configurable and can be used toggle
between emulation and native support depending upon the usecase.

Changes from v4

1. Extend dm-kcopyd to leverage copy-offload, while copying within the
same device. The other approach was to have copy-emulation by moving
dm-kcopyd to block layer. But it also required moving core dm-io infra,
causing a massive churn across multiple dm-targets.

2. Remove export in bio_map_kern()
3. Change copy_offload sysfs to accept 0 or else
4. Rename copy support flag to QUEUE_FLAG_SIMPLE_COPY
5. Rename payload entries, add source bdev field to be used while
partition remapping, remove copy_size
6. Change the blkdev_issue_copy() interface to accept destination and
source values in sector rather in bytes
7. Add payload to bio using bio_map_kern() for copy_offload case
8. Add check to return error if one of the source range length is 0
9. Add BLKDEV_COPY_NOEMULATION flag to allow user to not try copy
emulation incase of copy offload is not supported. Caller can his use
his existing copying logic to complete the io.
10. Bug fix copy checks and reduce size of rcu_lock()

Planned for next:
- adding blktests
- handling larger (than device limits) copy
- decide on ioctl interface (man-page etc.)

Changes from v3

1. gfp_flag fixes.
2. Export bio_map_kern() and use it to allocate and add pages to bio.
3. Move copy offload, reading to buf, writing from buf to separate functions.
4. Send read bio of copy offload by chaining them and submit asynchronously.
5. Add gendisk->part0 and part->bd_start_sect changes to blk_check_copy().
6. Move single source range limit check to blk_check_copy()
7. Rename __blkdev_issue_copy() to blkdev_issue_copy and remove old helper.
8. Change blkdev_issue_copy() interface generic to accepts destination bdev
to support XCOPY as well.
9. Add invalidate_kernel_vmap_range() after reading data for vmalloc'ed memory.
10. Fix buf allocoation logic to allocate buffer for the total size of copy.
11. Reword patch commit description.

Changes from v2

1. Add emulation support for devices not supporting copy.
2. Add *copy_offload* sysfs entry to enable and disable copy_offload
in devices supporting simple copy.
3. Remove simple copy support for stacked devices.

Changes from v1:

1. Fix memory leak in __blkdev_issue_copy
2. Unmark blk_check_copy inline
3. Fix line break in blk_check_copy_eod
4. Remove p checks and made code more readable
5. Don't use bio_set_op_attrs and remove op and set
   bi_opf directly
6. Use struct_size to calculate total_size
7. Fix partition remap of copy destination
8. Remove mcl,mssrl,msrc from nvme_ns
9. Initialize copy queue limits to 0 in nvme_config_copy
10. Remove return in QUEUE_FLAG_COPY check
11. Remove unused OCFS

SelvaKumar S (4):
  block: make bio_map_kern() non static
  block: add simple copy support
  nvme: add simple copy support
  dm kcopyd: add simple copy offload support