Re: [RFC PATCH v5 2/4] block: add simple copy support

2021-04-14 Thread Selva Jove
I agree with you. Will remove BLKDEV_COPY_NOEMULATION.

On Tue, Apr 13, 2021 at 6:03 AM Damien Le Moal  wrote:
>
> On 2021/04/12 23:35, Selva Jove wrote:
> > On Mon, Apr 12, 2021 at 5:55 AM Damien Le Moal  
> > wrote:
> >>
> >> On 2021/04/07 20:33, Selva Jove wrote:
> >>> Initially I started moving the dm-kcopyd interface to the block layer
> >>> as a generic interface.
> >>> Once I dig deeper in dm-kcopyd code, I figured that dm-kcopyd is
> >>> tightly coupled with dm_io()
> >>>
> >>> To move dm-kcopyd to block layer, it would also require dm_io code to
> >>> be moved to block layer.
> >>> It would cause havoc in dm layer, as it is the backbone of the
> >>> dm-layer and needs complete
> >>> rewriting of dm-layer. Do you see any other way of doing this without
> >>> having to move dm_io code
> >>> or to have redundant code ?
> >>
> >> Right. Missed that. So reusing dm-kcopyd and making it a common interface 
> >> will
> >> take some more efforts. OK, then. For the first round of commits, let's 
> >> forget
> >> about this. But I still think that your emulation could be a lot better 
> >> than a
> >> loop doing blocking writes after blocking reads.
> >>
> >
> > Current implementation issues read asynchronously and once all the reads are
> > completed, then the write is issued as whole to reduce the IO traffic
> > in the queue.
> > I agree that things can be better. Will explore another approach of
> > sending writes
> > immediately once reads are completed and with  plugging to increase the 
> > chances
> > of merging.
> >
> >> [...]
> > +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
> > + struct range_entry *src_rlist, struct block_device 
> > *dest_bdev,
> > + sector_t dest, gfp_t gfp_mask, int flags)
> > +{
> > + struct request_queue *q = bdev_get_queue(src_bdev);
> > + struct request_queue *dest_q = bdev_get_queue(dest_bdev);
> > + struct blk_copy_payload *payload;
> > + sector_t bs_mask, copy_size;
> > + int ret;
> > +
> > + ret = blk_prepare_payload(src_bdev, nr_srcs, src_rlist, gfp_mask,
> > + , _size);
> > + if (ret)
> > + return ret;
> > +
> > + bs_mask = (bdev_logical_block_size(dest_bdev) >> 9) - 1;
> > + if (dest & bs_mask) {
> > + return -EINVAL;
> > + goto out;
> > + }
> > +
> > + if (q == dest_q && q->limits.copy_offload) {
> > + ret = blk_copy_offload(src_bdev, payload, dest, gfp_mask);
> > + if (ret)
> > + goto out;
> > + } else if (flags & BLKDEV_COPY_NOEMULATION) {
> 
>  Why ? whoever calls blkdev_issue_copy() wants a copy to be done. Why 
>  would that
>  user say "Fail on me if the device does not support copy" ??? This is a 
>  weird
>  interface in my opinion.
> 
> >>>
> >>> BLKDEV_COPY_NOEMULATION flag was introduced to allow blkdev_issue_copy() 
> >>> callers
> >>> to use their native copying method instead of the emulated copy that I
> >>> added. This way we
> >>> ensure that dm uses the hw-assisted copy and if that is not present,
> >>> it falls back to existing
> >>> copy method.
> >>>
> >>> The other users who don't have their native emulation can use this
> >>> emulated-copy implementation.
> >>
> >> I do not understand. Emulation or not should be entirely driven by the 
> >> device
> >> reporting support for simple copy (or not). It does not matter which 
> >> component
> >> is issuing the simple copy call: an FS to a real device, and FS to a DM 
> >> device
> >> or a DM target driver. If the underlying device reported support for simple
> >> copy, use that. Otherwise, emulate with read/write. What am I missing here 
> >> ?
> >>
> >
> > blkdev_issue_copy() api will generally complete the copy-operation,
> > either by using
> > offloaded-copy or by using emulated-copy. The caller of the api is not
> > required to
> > figure the type of support. However, it can opt out of emulated-copy
> > by specifying
> > the flag BLKDEV_NOEMULATION. This is helpful for the case when the
> > caller already
> > has got a sophisticated emulation (e.g. dm-kcopyd users).
>
> This does not make any sense to me. If the user has already another mean of
> doing copies, then that user will not call blkdev_issue_copy(). So I really do
> not understand what the "opting out of emulated copy" would be useful for. 
> That
> user can check the simple copy support glag in the device request queue and 
> act
> accordingly: use its own block copy code when simple copy is not supported or
> use blkdev_issue_copy() when the device has simple copy. Adding that
> BLKDEV_COPY_NOEMULATION does not serve any purpose at all.
>
>
>
> --
> Damien Le Moal
> Western Digital Research


Re: [RFC PATCH v5 2/4] block: add simple copy support

2021-04-12 Thread Damien Le Moal
On 2021/04/12 23:35, Selva Jove wrote:
> On Mon, Apr 12, 2021 at 5:55 AM Damien Le Moal  wrote:
>>
>> On 2021/04/07 20:33, Selva Jove wrote:
>>> Initially I started moving the dm-kcopyd interface to the block layer
>>> as a generic interface.
>>> Once I dig deeper in dm-kcopyd code, I figured that dm-kcopyd is
>>> tightly coupled with dm_io()
>>>
>>> To move dm-kcopyd to block layer, it would also require dm_io code to
>>> be moved to block layer.
>>> It would cause havoc in dm layer, as it is the backbone of the
>>> dm-layer and needs complete
>>> rewriting of dm-layer. Do you see any other way of doing this without
>>> having to move dm_io code
>>> or to have redundant code ?
>>
>> Right. Missed that. So reusing dm-kcopyd and making it a common interface 
>> will
>> take some more efforts. OK, then. For the first round of commits, let's 
>> forget
>> about this. But I still think that your emulation could be a lot better than 
>> a
>> loop doing blocking writes after blocking reads.
>>
> 
> Current implementation issues read asynchronously and once all the reads are
> completed, then the write is issued as whole to reduce the IO traffic
> in the queue.
> I agree that things can be better. Will explore another approach of
> sending writes
> immediately once reads are completed and with  plugging to increase the 
> chances
> of merging.
> 
>> [...]
> +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
> + struct range_entry *src_rlist, struct block_device 
> *dest_bdev,
> + sector_t dest, gfp_t gfp_mask, int flags)
> +{
> + struct request_queue *q = bdev_get_queue(src_bdev);
> + struct request_queue *dest_q = bdev_get_queue(dest_bdev);
> + struct blk_copy_payload *payload;
> + sector_t bs_mask, copy_size;
> + int ret;
> +
> + ret = blk_prepare_payload(src_bdev, nr_srcs, src_rlist, gfp_mask,
> + , _size);
> + if (ret)
> + return ret;
> +
> + bs_mask = (bdev_logical_block_size(dest_bdev) >> 9) - 1;
> + if (dest & bs_mask) {
> + return -EINVAL;
> + goto out;
> + }
> +
> + if (q == dest_q && q->limits.copy_offload) {
> + ret = blk_copy_offload(src_bdev, payload, dest, gfp_mask);
> + if (ret)
> + goto out;
> + } else if (flags & BLKDEV_COPY_NOEMULATION) {

 Why ? whoever calls blkdev_issue_copy() wants a copy to be done. Why would 
 that
 user say "Fail on me if the device does not support copy" ??? This is a 
 weird
 interface in my opinion.

>>>
>>> BLKDEV_COPY_NOEMULATION flag was introduced to allow blkdev_issue_copy() 
>>> callers
>>> to use their native copying method instead of the emulated copy that I
>>> added. This way we
>>> ensure that dm uses the hw-assisted copy and if that is not present,
>>> it falls back to existing
>>> copy method.
>>>
>>> The other users who don't have their native emulation can use this
>>> emulated-copy implementation.
>>
>> I do not understand. Emulation or not should be entirely driven by the device
>> reporting support for simple copy (or not). It does not matter which 
>> component
>> is issuing the simple copy call: an FS to a real device, and FS to a DM 
>> device
>> or a DM target driver. If the underlying device reported support for simple
>> copy, use that. Otherwise, emulate with read/write. What am I missing here ?
>>
> 
> blkdev_issue_copy() api will generally complete the copy-operation,
> either by using
> offloaded-copy or by using emulated-copy. The caller of the api is not
> required to
> figure the type of support. However, it can opt out of emulated-copy
> by specifying
> the flag BLKDEV_NOEMULATION. This is helpful for the case when the
> caller already
> has got a sophisticated emulation (e.g. dm-kcopyd users).

This does not make any sense to me. If the user has already another mean of
doing copies, then that user will not call blkdev_issue_copy(). So I really do
not understand what the "opting out of emulated copy" would be useful for. That
user can check the simple copy support glag in the device request queue and act
accordingly: use its own block copy code when simple copy is not supported or
use blkdev_issue_copy() when the device has simple copy. Adding that
BLKDEV_COPY_NOEMULATION does not serve any purpose at all.



-- 
Damien Le Moal
Western Digital Research


Re: [RFC PATCH v5 2/4] block: add simple copy support

2021-04-12 Thread Selva Jove
On Mon, Apr 12, 2021 at 5:55 AM Damien Le Moal  wrote:
>
> On 2021/04/07 20:33, Selva Jove wrote:
> > Initially I started moving the dm-kcopyd interface to the block layer
> > as a generic interface.
> > Once I dig deeper in dm-kcopyd code, I figured that dm-kcopyd is
> > tightly coupled with dm_io()
> >
> > To move dm-kcopyd to block layer, it would also require dm_io code to
> > be moved to block layer.
> > It would cause havoc in dm layer, as it is the backbone of the
> > dm-layer and needs complete
> > rewriting of dm-layer. Do you see any other way of doing this without
> > having to move dm_io code
> > or to have redundant code ?
>
> Right. Missed that. So reusing dm-kcopyd and making it a common interface will
> take some more efforts. OK, then. For the first round of commits, let's forget
> about this. But I still think that your emulation could be a lot better than a
> loop doing blocking writes after blocking reads.
>

Current implementation issues read asynchronously and once all the reads are
completed, then the write is issued as whole to reduce the IO traffic
in the queue.
I agree that things can be better. Will explore another approach of
sending writes
immediately once reads are completed and with  plugging to increase the chances
of merging.

> [...]
> >>> +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
> >>> + struct range_entry *src_rlist, struct block_device 
> >>> *dest_bdev,
> >>> + sector_t dest, gfp_t gfp_mask, int flags)
> >>> +{
> >>> + struct request_queue *q = bdev_get_queue(src_bdev);
> >>> + struct request_queue *dest_q = bdev_get_queue(dest_bdev);
> >>> + struct blk_copy_payload *payload;
> >>> + sector_t bs_mask, copy_size;
> >>> + int ret;
> >>> +
> >>> + ret = blk_prepare_payload(src_bdev, nr_srcs, src_rlist, gfp_mask,
> >>> + , _size);
> >>> + if (ret)
> >>> + return ret;
> >>> +
> >>> + bs_mask = (bdev_logical_block_size(dest_bdev) >> 9) - 1;
> >>> + if (dest & bs_mask) {
> >>> + return -EINVAL;
> >>> + goto out;
> >>> + }
> >>> +
> >>> + if (q == dest_q && q->limits.copy_offload) {
> >>> + ret = blk_copy_offload(src_bdev, payload, dest, gfp_mask);
> >>> + if (ret)
> >>> + goto out;
> >>> + } else if (flags & BLKDEV_COPY_NOEMULATION) {
> >>
> >> Why ? whoever calls blkdev_issue_copy() wants a copy to be done. Why would 
> >> that
> >> user say "Fail on me if the device does not support copy" ??? This is a 
> >> weird
> >> interface in my opinion.
> >>
> >
> > BLKDEV_COPY_NOEMULATION flag was introduced to allow blkdev_issue_copy() 
> > callers
> > to use their native copying method instead of the emulated copy that I
> > added. This way we
> > ensure that dm uses the hw-assisted copy and if that is not present,
> > it falls back to existing
> > copy method.
> >
> > The other users who don't have their native emulation can use this
> > emulated-copy implementation.
>
> I do not understand. Emulation or not should be entirely driven by the device
> reporting support for simple copy (or not). It does not matter which component
> is issuing the simple copy call: an FS to a real device, and FS to a DM device
> or a DM target driver. If the underlying device reported support for simple
> copy, use that. Otherwise, emulate with read/write. What am I missing here ?
>

blkdev_issue_copy() api will generally complete the copy-operation,
either by using
offloaded-copy or by using emulated-copy. The caller of the api is not
required to
figure the type of support. However, it can opt out of emulated-copy
by specifying
the flag BLKDEV_NOEMULATION. This is helpful for the case when the
caller already
has got a sophisticated emulation (e.g. dm-kcopyd users).

>
> [...]
> >>> @@ -565,6 +569,12 @@ int blk_stack_limits(struct queue_limits *t, struct 
> >>> queue_limits *b,
> >>>   if (b->chunk_sectors)
> >>>   t->chunk_sectors = gcd(t->chunk_sectors, b->chunk_sectors);
> >>>
> >>> + /* simple copy not supported in stacked devices */
> >>> + t->copy_offload = 0;
> >>> + t->max_copy_sectors = 0;
> >>> + t->max_copy_range_sectors = 0;
> >>> + t->max_copy_nr_ranges = 0;
> >>
> >> You do not need this. Limits not explicitely initialized are 0 already.
> >> But I do not see why you can't support copy on stacked devices. That 
> >> should be
> >> feasible taking the min() for each of the above limit.
> >>
> >
> > Disabling stacked device support was feedback from v2.
> >
> > https://patchwork.kernel.org/project/linux-block/patch/20201204094659.12732-2-selvakuma...@samsung.com/
>
> Right. But the initialization to 0 is still not needed. The fields are already
> initialized to 0.
>
>
> --
> Damien Le Moal
> Western Digital Research


Re: [RFC PATCH v5 2/4] block: add simple copy support

2021-04-11 Thread Damien Le Moal
On 2021/04/07 20:33, Selva Jove wrote:
> Initially I started moving the dm-kcopyd interface to the block layer
> as a generic interface.
> Once I dig deeper in dm-kcopyd code, I figured that dm-kcopyd is
> tightly coupled with dm_io()
> 
> To move dm-kcopyd to block layer, it would also require dm_io code to
> be moved to block layer.
> It would cause havoc in dm layer, as it is the backbone of the
> dm-layer and needs complete
> rewriting of dm-layer. Do you see any other way of doing this without
> having to move dm_io code
> or to have redundant code ?

Right. Missed that. So reusing dm-kcopyd and making it a common interface will
take some more efforts. OK, then. For the first round of commits, let's forget
about this. But I still think that your emulation could be a lot better than a
loop doing blocking writes after blocking reads.

[...]
>>> +int blkdev_issue_copy(struct block_device *src_bdev, int nr_srcs,
>>> + struct range_entry *src_rlist, struct block_device *dest_bdev,
>>> + sector_t dest, gfp_t gfp_mask, int flags)
>>> +{
>>> + struct request_queue *q = bdev_get_queue(src_bdev);
>>> + struct request_queue *dest_q = bdev_get_queue(dest_bdev);
>>> + struct blk_copy_payload *payload;
>>> + sector_t bs_mask, copy_size;
>>> + int ret;
>>> +
>>> + ret = blk_prepare_payload(src_bdev, nr_srcs, src_rlist, gfp_mask,
>>> + , _size);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + bs_mask = (bdev_logical_block_size(dest_bdev) >> 9) - 1;
>>> + if (dest & bs_mask) {
>>> + return -EINVAL;
>>> + goto out;
>>> + }
>>> +
>>> + if (q == dest_q && q->limits.copy_offload) {
>>> + ret = blk_copy_offload(src_bdev, payload, dest, gfp_mask);
>>> + if (ret)
>>> + goto out;
>>> + } else if (flags & BLKDEV_COPY_NOEMULATION) {
>>
>> Why ? whoever calls blkdev_issue_copy() wants a copy to be done. Why would 
>> that
>> user say "Fail on me if the device does not support copy" ??? This is a weird
>> interface in my opinion.
>>
> 
> BLKDEV_COPY_NOEMULATION flag was introduced to allow blkdev_issue_copy() 
> callers
> to use their native copying method instead of the emulated copy that I
> added. This way we
> ensure that dm uses the hw-assisted copy and if that is not present,
> it falls back to existing
> copy method.
> 
> The other users who don't have their native emulation can use this
> emulated-copy implementation.

I do not understand. Emulation or not should be entirely driven by the device
reporting support for simple copy (or not). It does not matter which component
is issuing the simple copy call: an FS to a real device, and FS to a DM device
or a DM target driver. If the underlying device reported support for simple
copy, use that. Otherwise, emulate with read/write. What am I missing here ?

[...]
>>> @@ -565,6 +569,12 @@ int blk_stack_limits(struct queue_limits *t, struct 
>>> queue_limits *b,
>>>   if (b->chunk_sectors)
>>>   t->chunk_sectors = gcd(t->chunk_sectors, b->chunk_sectors);
>>>
>>> + /* simple copy not supported in stacked devices */
>>> + t->copy_offload = 0;
>>> + t->max_copy_sectors = 0;
>>> + t->max_copy_range_sectors = 0;
>>> + t->max_copy_nr_ranges = 0;
>>
>> You do not need this. Limits not explicitely initialized are 0 already.
>> But I do not see why you can't support copy on stacked devices. That should 
>> be
>> feasible taking the min() for each of the above limit.
>>
> 
> Disabling stacked device support was feedback from v2.
> 
> https://patchwork.kernel.org/project/linux-block/patch/20201204094659.12732-2-selvakuma...@samsung.com/

Right. But the initialization to 0 is still not needed. The fields are already
initialized to 0.


-- 
Damien Le Moal
Western Digital Research


Re: [RFC PATCH v5 2/4] block: add simple copy support

2021-04-07 Thread Selva Jove
Initially I started moving the dm-kcopyd interface to the block layer
as a generic interface.
Once I dig deeper in dm-kcopyd code, I figured that dm-kcopyd is
tightly coupled with dm_io()

To move dm-kcopyd to block layer, it would also require dm_io code to
be moved to block layer.
It would cause havoc in dm layer, as it is the backbone of the
dm-layer and needs complete
rewriting of dm-layer. Do you see any other way of doing this without
having to move dm_io code
or to have redundant code ?


On Sat, Feb 20, 2021 at 10:29 AM Damien Le Moal  wrote:
>
> On 2021/02/20 11:01, SelvaKumar S wrote:
> > Add new BLKCOPY ioctl that offloads copying of one or more sources
> > ranges to a destination in the device. Accepts a 'copy_range' structure
> > that contains destination (in sectors), no of sources and pointer to the
> > array of source ranges. Each source range is represented by 'range_entry'
> > that contains start and length of source ranges (in sectors).
> >
> > Introduce REQ_OP_COPY, a no-merge copy offload operation. Create
> > bio with control information as payload and submit to the device.
> > REQ_OP_COPY(19) is a write op and takes zone_write_lock when submitted
> > to zoned device.
> >
> > If the device doesn't support copy or copy offload is disabled, then
> > copy operation is emulated by default. However, the copy-emulation is an
> > opt-in feature. Caller can choose not to use the copy-emulation by
> > specifying a flag 'BLKDEV_COPY_NOEMULATION'.
> >
> > Copy-emulation is implemented by allocating memory of total copy size.
> > The source ranges are read into memory by chaining bio for each source
> > ranges and submitting them async and the last bio waits for completion.
> > After data is read, it is written to the destination.
> >
> > bio_map_kern() is used to allocate bio and add pages of copy buffer to
> > bio. As bio->bi_private and bio->bi_end_io are needed for chaining the
> > bio and gets over-written, invalidate_kernel_vmap_range() for read is
> > called in the caller.
> >
> > Introduce queue limits for simple copy and other helper functions.
> > Add device limits as sysfs entries.
> >   - copy_offload
> >   - max_copy_sectors
> >   - max_copy_ranges_sectors
> >   - max_copy_nr_ranges
> >
> > copy_offload(= 0) is disabled by default. This needs to be enabled if
> > copy-offload needs to be used.
> > max_copy_sectors = 0 indicates the device doesn't support native copy.
> >
> > Native copy offload is not supported for stacked devices and is done via
> > copy emulation.
> >
> > Signed-off-by: SelvaKumar S 
> > Signed-off-by: Kanchan Joshi 
> > Signed-off-by: Nitesh Shetty 
> > Signed-off-by: Javier González 
> > Signed-off-by: Chaitanya Kulkarni 
> > ---
> >  block/blk-core.c  | 102 --
> >  block/blk-lib.c   | 222 ++
> >  block/blk-merge.c |   2 +
> >  block/blk-settings.c  |  10 ++
> >  block/blk-sysfs.c |  47 
> >  block/blk-zoned.c |   1 +
> >  block/bounce.c|   1 +
> >  block/ioctl.c |  33 ++
> >  include/linux/bio.h   |   1 +
> >  include/linux/blk_types.h |  14 +++
> >  include/linux/blkdev.h|  15 +++
> >  include/uapi/linux/fs.h   |  13 +++
> >  12 files changed, 453 insertions(+), 8 deletions(-)
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 7663a9b94b80..23e646e5ae43 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -720,6 +720,17 @@ static noinline int should_fail_bio(struct bio *bio)
> >  }
> >  ALLOW_ERROR_INJECTION(should_fail_bio, ERRNO);
> >
> > +static inline int bio_check_copy_eod(struct bio *bio, sector_t start,
> > + sector_t nr_sectors, sector_t max_sect)
> > +{
> > + if (nr_sectors && max_sect &&
> > + (nr_sectors > max_sect || start > max_sect - nr_sectors)) {
> > + handle_bad_sector(bio, max_sect);
> > + return -EIO;
> > + }
> > + return 0;
> > +}
> > +
> >  /*
> >   * Check whether this bio extends beyond the end of the device or 
> > partition.
> >   * This may well happen - the kernel calls bread() without checking the 
> > size of
> > @@ -738,6 +749,75 @@ static inline int bio_check_eod(struct bio *bio, 
> > sector_t maxsector)
> >   return 0;
> >  }
> >
> > +/*
> > + * Check for copy limits and remap source ranges if needed.
> > + */
> > +static int blk_check_copy(struct bio *bio)
> > +{
> > + struct blk_copy_payload *payload = bio_data(bio);
> > + struct request_queue *q = bio->bi_disk->queue;
> > + sector_t max_sect, start_sect, copy_size = 0;
> > + sector_t src_max_sect, src_start_sect;
> > + struct block_device *bd_part;
> > + int i, ret = -EIO;
> > +
> > + rcu_read_lock();
> > +
> > + bd_part = __disk_get_part(bio->bi_disk, bio->bi_partno);
> > + if (unlikely(!bd_part)) {
> > + rcu_read_unlock();
> > + goto out;
> > + }
> > +
> > + 

Re: [RFC PATCH v5 2/4] block: add simple copy support

2021-02-19 Thread Damien Le Moal
On 2021/02/20 11:01, SelvaKumar S wrote:
> Add new BLKCOPY ioctl that offloads copying of one or more sources
> ranges to a destination in the device. Accepts a 'copy_range' structure
> that contains destination (in sectors), no of sources and pointer to the
> array of source ranges. Each source range is represented by 'range_entry'
> that contains start and length of source ranges (in sectors).
> 
> Introduce REQ_OP_COPY, a no-merge copy offload operation. Create
> bio with control information as payload and submit to the device.
> REQ_OP_COPY(19) is a write op and takes zone_write_lock when submitted
> to zoned device.
> 
> If the device doesn't support copy or copy offload is disabled, then
> copy operation is emulated by default. However, the copy-emulation is an
> opt-in feature. Caller can choose not to use the copy-emulation by
> specifying a flag 'BLKDEV_COPY_NOEMULATION'.
> 
> Copy-emulation is implemented by allocating memory of total copy size.
> The source ranges are read into memory by chaining bio for each source
> ranges and submitting them async and the last bio waits for completion.
> After data is read, it is written to the destination.
> 
> bio_map_kern() is used to allocate bio and add pages of copy buffer to
> bio. As bio->bi_private and bio->bi_end_io are needed for chaining the
> bio and gets over-written, invalidate_kernel_vmap_range() for read is
> called in the caller.
> 
> Introduce queue limits for simple copy and other helper functions.
> Add device limits as sysfs entries.
>   - copy_offload
>   - max_copy_sectors
>   - max_copy_ranges_sectors
>   - max_copy_nr_ranges
> 
> copy_offload(= 0) is disabled by default. This needs to be enabled if
> copy-offload needs to be used.
> max_copy_sectors = 0 indicates the device doesn't support native copy.
> 
> Native copy offload is not supported for stacked devices and is done via
> copy emulation.
> 
> Signed-off-by: SelvaKumar S 
> Signed-off-by: Kanchan Joshi 
> Signed-off-by: Nitesh Shetty 
> Signed-off-by: Javier González 
> Signed-off-by: Chaitanya Kulkarni 
> ---
>  block/blk-core.c  | 102 --
>  block/blk-lib.c   | 222 ++
>  block/blk-merge.c |   2 +
>  block/blk-settings.c  |  10 ++
>  block/blk-sysfs.c |  47 
>  block/blk-zoned.c |   1 +
>  block/bounce.c|   1 +
>  block/ioctl.c |  33 ++
>  include/linux/bio.h   |   1 +
>  include/linux/blk_types.h |  14 +++
>  include/linux/blkdev.h|  15 +++
>  include/uapi/linux/fs.h   |  13 +++
>  12 files changed, 453 insertions(+), 8 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 7663a9b94b80..23e646e5ae43 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -720,6 +720,17 @@ static noinline int should_fail_bio(struct bio *bio)
>  }
>  ALLOW_ERROR_INJECTION(should_fail_bio, ERRNO);
>  
> +static inline int bio_check_copy_eod(struct bio *bio, sector_t start,
> + sector_t nr_sectors, sector_t max_sect)
> +{
> + if (nr_sectors && max_sect &&
> + (nr_sectors > max_sect || start > max_sect - nr_sectors)) {
> + handle_bad_sector(bio, max_sect);
> + return -EIO;
> + }
> + return 0;
> +}
> +
>  /*
>   * Check whether this bio extends beyond the end of the device or partition.
>   * This may well happen - the kernel calls bread() without checking the size 
> of
> @@ -738,6 +749,75 @@ static inline int bio_check_eod(struct bio *bio, 
> sector_t maxsector)
>   return 0;
>  }
>  
> +/*
> + * Check for copy limits and remap source ranges if needed.
> + */
> +static int blk_check_copy(struct bio *bio)
> +{
> + struct blk_copy_payload *payload = bio_data(bio);
> + struct request_queue *q = bio->bi_disk->queue;
> + sector_t max_sect, start_sect, copy_size = 0;
> + sector_t src_max_sect, src_start_sect;
> + struct block_device *bd_part;
> + int i, ret = -EIO;
> +
> + rcu_read_lock();
> +
> + bd_part = __disk_get_part(bio->bi_disk, bio->bi_partno);
> + if (unlikely(!bd_part)) {
> + rcu_read_unlock();
> + goto out;
> + }
> +
> + max_sect =  bdev_nr_sectors(bd_part);
> + start_sect = bd_part->bd_start_sect;
> +
> + src_max_sect = bdev_nr_sectors(payload->src_bdev);
> + src_start_sect = payload->src_bdev->bd_start_sect;
> +
> + if (unlikely(should_fail_request(bd_part, bio->bi_iter.bi_size)))
> + goto out;
> +
> + if (unlikely(bio_check_ro(bio, bd_part)))
> + goto out;

There is no rcu_unlock() in that out label. Did you test ?

> +
> + rcu_read_unlock();
> +
> + /* cannot handle copy crossing nr_ranges limit */
> + if (payload->copy_nr_ranges > q->limits.max_copy_nr_ranges)
> + goto out;
> +
> + for (i = 0; i < payload->copy_nr_ranges; i++) {
> + ret = bio_check_copy_eod(bio, payload->range[i].src,
> +