Re: [PATCH] dm: error: Add support for zoned block devices

2023-10-24 Thread Damien Le Moal
On 10/25/23 14:22, Naohiro Aota wrote:
> On Tue, Oct 24, 2023 at 07:45:13AM +0900, Damien Le Moal wrote:
>> dm-error is used in several test cases in the xfstests test suite to
>> check the handling of IO errors in file syatems. However, with several
>   ^ systems
> 
>> file systems getting native support for zoned block devices (e.g. btrfs
>> and f2fs), dm-error lack of zoned block device support creates problems
>> as the file system attempt executing zone commands (e.g. a zone append
>> operation) against a dm-error non-zoned block device, which causes
>> various issues in the block layer (e.g. WARN_ON triggers).
>>
>> This patch adds supports for zoned block devices to dm-error, allowing
>> an error table to be exposed as a zoned block device. This is done by
>> relying on the first argument passed to dmsetup when creating the device
>> table: if that first argument is a path to a backing block device, the
>> dm-error device is created by copying the limits of the backing device,
>> thus also copying its zone model. This is consistent with how xfstests
>> creates dm-error devices (always passing the path to the backing device
>> as the first argument).
>>
>> The zone support for dm-error requires the definition of the
>> report_zones target type method, which is done by introducing the
>> function io_err_report_zones(). Given that this function fails report
>> zones operations (similarly to any other command issued to the dm-error
>> device), dm_set_zones_restrictions() is tweaked to do nothing for a
>> wildcard target to avoid failing zone revalidation. As the dm-error
>> target does not implemt the iterate_devices method,
>   ^implement

Thanks. Will fix that.

> 
>> dm_table_supports_zoned_model() is also changed to return true.
>>
>> Signed-off-by: Damien Le Moal 
>> ---
>>  drivers/md/dm-table.c  |  3 +++
>>  drivers/md/dm-target.c | 42 --
>>  drivers/md/dm-zone.c   |  9 +
>>  3 files changed, 52 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
>> index 37b48f63ae6a..5e4d887063d3 100644
>> --- a/drivers/md/dm-table.c
>> +++ b/drivers/md/dm-table.c
>> @@ -1600,6 +1600,9 @@ static bool dm_table_supports_zoned_model(struct 
>> dm_table *t,
>>  for (unsigned int i = 0; i < t->num_targets; i++) {
>>  struct dm_target *ti = dm_table_get_target(t, i);
>>  
>> +if (dm_target_is_wildcard(ti->type))
>> +continue;
>> +
> 
> This seems tricky to me. Currently, dm-error is the only dm target having
> DM_TARGET_WILDCARD. But, can we expect that will be so forever?

Yes, I saw that. Not sure. Mike ?

> Also, I considered what happens if the backing device is non-zoned
> one. dm_table_supports_zoned_model() returns true in that case. But, it is
> still reported as non-zoned as we copy non-zoned queue limits. So, it is
> OK ... but it is a bit tricky.

Returning true for dm_table_supports_zoned_model() is not an issue. As the name
of the function says, this checks that the device table is OK with zoned device
but does not force the dm device to be zoned.

> Instead, how about implementing the .iterate_devices just like
> linear_iterate_devices?

Because that would change how dm-error needs to be setup. Currently, there are
no arguments needed for the table entry for dm-error. If we define the
->iterate_devices operation, we'll need a backing device and mapping start
sector. Changing the dmsetup command line for all users is probably not the best
approach... Unless I am missing something...

>> diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
>> index 27e2992ff249..1bf4ecda3012 100644
>> --- a/drivers/md/dm-target.c
>> +++ b/drivers/md/dm-target.c
>> @@ -118,6 +118,21 @@ EXPORT_SYMBOL(dm_unregister_target);
>>   */
>>  static int io_err_ctr(struct dm_target *tt, unsigned int argc, char **args)
>>  {
>> +struct dm_dev *ddev;
>> +int ret;
>> +
>> +/*
>> + * If we have an argument, assume it is the path to the target
>> + * block device we are replacing. In this case, get the device
>> + * so that we can copy its limits in io_err_io_hints().
>> + */
>> +if (argc) {
>> +ret = dm_get_device(tt, args[0], dm_table_get_mode(tt->table),
>> +);
>> +if (ret == 0)
>> +tt->private = ddev;
> 
> No need to handle an error case here? Or, I guess it ignores an error for
> compatibility when non-device argument is specified. Then, I'd like to add
> a note here.

Yes, exactly. Before the change, arguments were ignored. I will add a comment.

-- 
Damien Le Moal
Western Digital Research




[PATCH] dm: error: Add support for zoned block devices

2023-10-23 Thread Damien Le Moal
dm-error is used in several test cases in the xfstests test suite to
check the handling of IO errors in file syatems. However, with several
file systems getting native support for zoned block devices (e.g. btrfs
and f2fs), dm-error lack of zoned block device support creates problems
as the file system attempt executing zone commands (e.g. a zone append
operation) against a dm-error non-zoned block device, which causes
various issues in the block layer (e.g. WARN_ON triggers).

This patch adds supports for zoned block devices to dm-error, allowing
an error table to be exposed as a zoned block device. This is done by
relying on the first argument passed to dmsetup when creating the device
table: if that first argument is a path to a backing block device, the
dm-error device is created by copying the limits of the backing device,
thus also copying its zone model. This is consistent with how xfstests
creates dm-error devices (always passing the path to the backing device
as the first argument).

The zone support for dm-error requires the definition of the
report_zones target type method, which is done by introducing the
function io_err_report_zones(). Given that this function fails report
zones operations (similarly to any other command issued to the dm-error
device), dm_set_zones_restrictions() is tweaked to do nothing for a
wildcard target to avoid failing zone revalidation. As the dm-error
target does not implemt the iterate_devices method,
dm_table_supports_zoned_model() is also changed to return true.

Signed-off-by: Damien Le Moal 
---
 drivers/md/dm-table.c  |  3 +++
 drivers/md/dm-target.c | 42 --
 drivers/md/dm-zone.c   |  9 +
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 37b48f63ae6a..5e4d887063d3 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1600,6 +1600,9 @@ static bool dm_table_supports_zoned_model(struct dm_table 
*t,
for (unsigned int i = 0; i < t->num_targets; i++) {
struct dm_target *ti = dm_table_get_target(t, i);
 
+   if (dm_target_is_wildcard(ti->type))
+   continue;
+
if (dm_target_supports_zoned_hm(ti->type)) {
if (!ti->type->iterate_devices ||
ti->type->iterate_devices(ti, 
device_not_zoned_model,
diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index 27e2992ff249..1bf4ecda3012 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -118,6 +118,21 @@ EXPORT_SYMBOL(dm_unregister_target);
  */
 static int io_err_ctr(struct dm_target *tt, unsigned int argc, char **args)
 {
+   struct dm_dev *ddev;
+   int ret;
+
+   /*
+* If we have an argument, assume it is the path to the target
+* block device we are replacing. In this case, get the device
+* so that we can copy its limits in io_err_io_hints().
+*/
+   if (argc) {
+   ret = dm_get_device(tt, args[0], dm_table_get_mode(tt->table),
+   );
+   if (ret == 0)
+   tt->private = ddev;
+   }
+
/*
 * Return error for discards instead of -EOPNOTSUPP
 */
@@ -129,7 +144,10 @@ static int io_err_ctr(struct dm_target *tt, unsigned int 
argc, char **args)
 
 static void io_err_dtr(struct dm_target *tt)
 {
-   /* empty */
+   struct dm_dev *ddev = tt->private;
+
+   if (ddev)
+   dm_put_device(tt, ddev);
 }
 
 static int io_err_map(struct dm_target *tt, struct bio *bio)
@@ -149,8 +167,27 @@ static void io_err_release_clone_rq(struct request *clone,
 {
 }
 
+#ifdef CONFIG_BLK_DEV_ZONED
+static int io_err_report_zones(struct dm_target *ti,
+   struct dm_report_zones_args *args, unsigned int nr_zones)
+{
+   return -EIO;
+}
+#else
+#define io_err_report_zones NULL
+#endif
+
 static void io_err_io_hints(struct dm_target *ti, struct queue_limits *limits)
 {
+   struct dm_dev *ddev = ti->private;
+
+   /* If we have a target device, copy its limits */
+   if (ddev) {
+   struct request_queue *q = bdev_get_queue(ddev->bdev);
+
+   memcpy(limits, >limits, sizeof(*limits));
+   }
+
limits->max_discard_sectors = UINT_MAX;
limits->max_hw_discard_sectors = UINT_MAX;
limits->discard_granularity = 512;
@@ -166,7 +203,7 @@ static long io_err_dax_direct_access(struct dm_target *ti, 
pgoff_t pgoff,
 static struct target_type error_target = {
.name = "error",
.version = {1, 6, 0},
-   .features = DM_TARGET_WILDCARD,
+   .features = DM_TARGET_WILDCARD | DM_TARGET_ZONED_HM,
.ctr  = io_err_ctr,
.dtr  = io_err_dtr,
.map  = io_err_map,
@@ -174,6 +211,7 @@ static struct target_type error_target = {
.release_clone_rq =

Re: [dm-devel] [PATCH v3 28/49] dm zoned: dynamically allocate the dm-zoned-meta shrinker

2023-07-27 Thread Damien Le Moal
On 7/27/23 17:55, Qi Zheng wrote:
>>>   goto err;
>>>   }
>>>   +    zmd->mblk_shrinker->count_objects = dmz_mblock_shrinker_count;
>>> +    zmd->mblk_shrinker->scan_objects = dmz_mblock_shrinker_scan;
>>> +    zmd->mblk_shrinker->seeks = DEFAULT_SEEKS;
>>> +    zmd->mblk_shrinker->private_data = zmd;
>>> +
>>> +    shrinker_register(zmd->mblk_shrinker);
>>
>> I fail to see how this new shrinker API is better... Why isn't there a
>> shrinker_alloc_and_register() function ? That would avoid adding all this 
>> code
>> all over the place as the new API call would be very similar to the current
>> shrinker_register() call with static allocation.
> 
> In some registration scenarios, memory needs to be allocated in advance.
> So we continue to use the previous prealloc/register_prepared()
> algorithm. The shrinker_alloc_and_register() is just a helper function
> that combines the two, and this increases the number of APIs that
> shrinker exposes to the outside, so I choose not to add this helper.

And that results in more code in many places instead of less code + a simple
inline helper in the shrinker header file... So not adding that super simple
helper is not exactly the best choice in my opinion.

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v13 3/9] block: add emulation for copy

2023-07-10 Thread Damien Le Moal
nt len, gfp_t gfp_mask)
>  {
>   unsigned long kaddr = (unsigned long)data;
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 336146798e56..f8c80940c7ad 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -562,4 +562,9 @@ struct cio {
>   atomic_t refcount;
>  };
>  
> +struct copy_ctx {
> + struct cio *cio;
> + struct work_struct dispatch_work;
> + struct bio *write_bio;
> +};
>  #endif /* __LINUX_BLK_TYPES_H */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 963f5c97dec0..c176bf6173c5 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1047,6 +1047,9 @@ ssize_t blkdev_copy_offload(
>   struct block_device *bdev_in, loff_t pos_in,
>   struct block_device *bdev_out, loff_t pos_out,
>   size_t len, cio_iodone_t end_io, void *private, gfp_t gfp_mask);
> +struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int 
> len,
> + gfp_t gfp_mask);
> +void bio_map_kern_endio(struct bio *bio);
>  
>  #define BLKDEV_ZERO_NOUNMAP  (1 << 0)  /* do not free blocks */
>  #define BLKDEV_ZERO_NOFALLBACK   (1 << 1)  /* don't write explicit 
> zeroes */

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v13 1/9] block: Introduce queue limits for copy-offload support

2023-07-10 Thread Damien Le Moal
unsigned short  max_segments;
>   unsigned short  max_integrity_segments;
>   unsigned short  max_discard_segments;
> @@ -554,6 +557,7 @@ struct request_queue {
>  #define QUEUE_FLAG_NOWAIT   29   /* device supports NOWAIT */
>  #define QUEUE_FLAG_SQ_SCHED 30   /* single queue style io dispatch */
>  #define QUEUE_FLAG_SKIP_TAGSET_QUIESCE   31 /* quiesce_tagset skip the 
> queue*/
> +#define QUEUE_FLAG_COPY  32  /* enable/disable device copy 
> offload */
>  
>  #define QUEUE_FLAG_MQ_DEFAULT((1UL << QUEUE_FLAG_IO_STAT) |  
> \
>(1UL << QUEUE_FLAG_SAME_COMP) |\
> @@ -574,6 +578,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, 
> struct request_queue *q);
>   test_bit(QUEUE_FLAG_STABLE_WRITES, &(q)->queue_flags)
>  #define blk_queue_io_stat(q) test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
>  #define blk_queue_add_random(q)  test_bit(QUEUE_FLAG_ADD_RANDOM, 
> &(q)->queue_flags)
> +#define blk_queue_copy(q)test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
>  #define blk_queue_zone_resetall(q)   \
>   test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
>  #define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
> @@ -891,6 +896,8 @@ extern void blk_queue_chunk_sectors(struct request_queue 
> *, unsigned int);
>  extern void blk_queue_max_segments(struct request_queue *, unsigned short);
>  extern void blk_queue_max_discard_segments(struct request_queue *,
>   unsigned short);
> +extern void blk_queue_max_copy_sectors_hw(struct request_queue *q,
> + unsigned int max_copy_sectors);
>  void blk_queue_max_secure_erase_sectors(struct request_queue *q,
>   unsigned int max_sectors);
>  extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
> @@ -1210,6 +1217,11 @@ static inline unsigned int 
> bdev_discard_granularity(struct block_device *bdev)
>   return bdev_get_queue(bdev)->limits.discard_granularity;
>  }
>  
> +static inline unsigned int bdev_max_copy_sectors(struct block_device *bdev)
> +{
> + return bdev_get_queue(bdev)->limits.max_copy_sectors;
> +}
> +
>  static inline unsigned int
>  bdev_max_secure_erase_sectors(struct block_device *bdev)
>  {
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index b7b56871029c..dff56813f95a 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -64,6 +64,9 @@ struct fstrim_range {
>   __u64 minlen;
>  };
>  
> +/* maximum copy offload length, this is set to 128MB based on current 
> testing */
> +#define COPY_MAX_BYTES   (1 << 27)
> +
>  /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions 
> */
>  #define FILE_DEDUPE_RANGE_SAME   0
>  #define FILE_DEDUPE_RANGE_DIFFERS1

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v13 4/9] fs, block: copy_file_range for def_blk_ops for direct block device

2023-07-10 Thread Damien Le Moal
   else
> @@ -1708,7 +1709,9 @@ int generic_file_rw_checks(struct file *file_in, struct 
> file *file_out)
>   /* Don't copy dirs, pipes, sockets... */
>   if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
>   return -EISDIR;
> - if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> +
> + if ((!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode)) &&
> + (!S_ISBLK(inode_in->i_mode) || !S_ISBLK(inode_out->i_mode)))
>   return -EINVAL;
>  
>   if (!(file_in->f_mode & FMODE_READ) ||
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c176bf6173c5..850168cad080 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1047,6 +1047,10 @@ ssize_t blkdev_copy_offload(
>   struct block_device *bdev_in, loff_t pos_in,
>   struct block_device *bdev_out, loff_t pos_out,
>   size_t len, cio_iodone_t end_io, void *private, gfp_t gfp_mask);
> +ssize_t blkdev_copy_offload_failfast(
> + struct block_device *bdev_in, loff_t pos_in,
> + struct block_device *bdev_out, loff_t pos_out,
> + size_t len, gfp_t gfp_mask);
>  struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int 
> len,
>   gfp_t gfp_mask);
>  void bio_map_kern_endio(struct bio *bio);

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v13 2/9] block: Add copy offload support infrastructure

2023-07-10 Thread Damien Le Moal
EQ_OP_DRV_OUT  = (__force blk_opf_t)35,
> @@ -482,6 +485,12 @@ static inline bool op_is_write(blk_opf_t op)
>   return !!(op & (__force blk_opf_t)1);
>  }
>  
> +static inline bool op_is_copy(blk_opf_t op)
> +{
> + return (((op & REQ_OP_MASK) == REQ_OP_COPY_SRC) ||
> + ((op & REQ_OP_MASK) == REQ_OP_COPY_DST));
> +}
> +
>  /*
>   * Check if the bio or request is one that needs special treatment in the
>   * flush state machine.
> @@ -541,4 +550,16 @@ struct blk_rq_stat {
>   u64 batch;
>  };
>  
> +typedef void (cio_iodone_t)(void *private, int comp_len);
> +
> +struct cio {
> + struct task_struct *waiter; /* waiting task (NULL if none) */
> + loff_t pos_in;
> + loff_t pos_out;
> + ssize_t comp_len;
> + cio_iodone_t *endio;/* applicable for async operation */
> + void *private;  /* applicable for async operation */
> + atomic_t refcount;
> +};
> +
>  #endif /* __LINUX_BLK_TYPES_H */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 6098665953e6..963f5c97dec0 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1043,6 +1043,10 @@ int __blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>   sector_t nr_sects, gfp_t gfp_mask, struct bio **biop);
>  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
>   sector_t nr_sects, gfp_t gfp);
> +ssize_t blkdev_copy_offload(
> + struct block_device *bdev_in, loff_t pos_in,
> + struct block_device *bdev_out, loff_t pos_out,
> + size_t len, cio_iodone_t end_io, void *private, gfp_t gfp_mask);
>  
>  #define BLKDEV_ZERO_NOUNMAP  (1 << 0)  /* do not free blocks */
>  #define BLKDEV_ZERO_NOFALLBACK   (1 << 1)  /* don't write explicit 
> zeroes */

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v11 1/9] block: Introduce queue limits for copy-offload support

2023-05-22 Thread Damien Le Moal
larity;
>  
> + unsigned long   max_copy_sectors_hw;
> + unsigned long   max_copy_sectors;

Why unsigned long ? unsigned int gives you 4G * 512 = 2TB max copy. Isn't that a
large enough max limit ?

> +
>   unsigned short  max_segments;
>   unsigned short  max_integrity_segments;
>   unsigned short  max_discard_segments;
> @@ -561,6 +564,7 @@ struct request_queue {
>  #define QUEUE_FLAG_NOWAIT   29   /* device supports NOWAIT */
>  #define QUEUE_FLAG_SQ_SCHED 30   /* single queue style io dispatch */
>  #define QUEUE_FLAG_SKIP_TAGSET_QUIESCE   31 /* quiesce_tagset skip the 
> queue*/
> +#define QUEUE_FLAG_COPY  32  /* supports copy offload */

Nope. That is misleading. This flag is cleared in queue_copy_offload_store() if
the user writes 0. So this flag indicates that copy offload is enabled or
disabled, not that the device supports it. For the device support, one has to
look at max copy sectors hw being != 0.

>  
>  #define QUEUE_FLAG_MQ_DEFAULT((1UL << QUEUE_FLAG_IO_STAT) |  
> \
>(1UL << QUEUE_FLAG_SAME_COMP) |\
> @@ -581,6 +585,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, 
> struct request_queue *q);
>   test_bit(QUEUE_FLAG_STABLE_WRITES, &(q)->queue_flags)
>  #define blk_queue_io_stat(q) test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
>  #define blk_queue_add_random(q)  test_bit(QUEUE_FLAG_ADD_RANDOM, 
> &(q)->queue_flags)
> +#define blk_queue_copy(q)test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
>  #define blk_queue_zone_resetall(q)   \
>   test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
>  #define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
> @@ -899,6 +904,8 @@ extern void blk_queue_chunk_sectors(struct request_queue 
> *, unsigned int);
>  extern void blk_queue_max_segments(struct request_queue *, unsigned short);
>  extern void blk_queue_max_discard_segments(struct request_queue *,
>   unsigned short);
> +extern void blk_queue_max_copy_sectors_hw(struct request_queue *q,
> + unsigned int max_copy_sectors);
>  void blk_queue_max_secure_erase_sectors(struct request_queue *q,
>   unsigned int max_sectors);
>  extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
> @@ -1218,6 +1225,11 @@ static inline unsigned int 
> bdev_discard_granularity(struct block_device *bdev)
>   return bdev_get_queue(bdev)->limits.discard_granularity;
>  }
>  
> +static inline unsigned int bdev_max_copy_sectors(struct block_device *bdev)
> +{
> + return bdev_get_queue(bdev)->limits.max_copy_sectors;
> +}
> +
>  static inline unsigned int
>  bdev_max_secure_erase_sectors(struct block_device *bdev)
>  {
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index b7b56871029c..8879567791fa 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -64,6 +64,9 @@ struct fstrim_range {
>   __u64 minlen;
>  };
>  
> +/* maximum total copy length */
> +#define COPY_MAX_BYTES   (1 << 27)

My brain bit shifter is not as good as a CPU one :) So it would be nice to
mention what value that is in MB and also explain where this magic value comes 
from.

> +
>  /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions 
> */
>  #define FILE_DEDUPE_RANGE_SAME   0
>  #define FILE_DEDUPE_RANGE_DIFFERS1

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] Does dm-zoned support buffered write?

2023-05-15 Thread Damien Le Moal
ng super block to sdx block 196608
>> Syncing disk
>> Done.
>>
> Hmm. I don't actually see how many CMR zones the drive has.
> 
>> root@smr_dev:~# dmzadm --start /dev/sdx
>> /dev/sdx: 50782535680 512-byte sectors (24215 GiB)
>>Host-managed device
>>96860 zones, offset 0
>>96860 zones of 524288 512-byte sectors (256 MiB)
>>65536 4KB data blocks per zone
>> sdx: starting dmz-5000cca2bfc0db21, metadata ver. 2, uuid
>> 7495e21a-23d9-49f4-832a-76b32136078b
>>
>> root@smr_dev:~# mkfs.ext4 /dev/dm-0
>> mke2fs 1.44.5 (15-Dec-2018)
>> Discarding device blocks: done
>> Creating filesystem with 6346375168 4k blocks and 396648448 inodes
>> Filesystem UUID: c47de06d-6cf6-4a85-9502-7830ca2f4526
>> Superblock backups stored on blocks:
>>  32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 
>> 2654208,
>>  4096000, 7962624, 11239424, 2048, 23887872, 71663616, 78675968,
>>  10240, 214990848, 51200, 550731776, 644972544, 1934917632,
>>  256000, 3855122432, 5804752896
>>
>> Allocating group tables: done
>> Writing inode tables: done
>> Creating journal (262144 blocks): done
>> Writing superblocks and filesystem accounting information:
>>
>> ===
>> At another terminal,
>>
>> root@smr_dev:~# ps aux | grep mkfs.ext4
>> root 1411791  2.8  0.0  30992 19864 pts/1D+   01:30   0:01
>> mkfs.ext4 /dev/dm-0
>> root 1413640  0.0  0.0  13972  2496 pts/0S+   01:31   0:00
>> grep mkfs.ext4
>>
>> root@smr_dev:~# cat /proc/1411791/stack
>> [<0>] wait_on_page_bit+0x133/0x270
>> [<0>] wait_on_page_writeback+0x25/0x70
>> [<0>] __filemap_fdatawait_range+0x86/0xf0
>> [<0>] file_write_and_wait_range+0x74/0xb0
>> [<0>] blkdev_fsync+0x16/0x40
>> [<0>] do_fsync+0x38/0x60
>> [<0>] __x64_sys_fsync+0x10/0x20
>> [<0>] do_syscall_64+0x2d/0x70
>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Not sure if this is a bug, but doing a simple mkfs.ext4 on dm-zoned with a large
SMR disk can take *a very loong* time. This is because mkfs.ext4 does a lot
of random writes all over the place. So just running that, dm-zoned goes into
heavy GC mode...

To speed things up (and improve runtime performance), use the packed-metadata
format: mkfs.ext4 -E packed_meta_blocks=1
Or do a mkfs.xfs to compare and see how much faster it is.

> 
> But that just means that we're waiting for I/O to complete; there must 
> be another thread processing the I/O.
> If this is the only active thread in you system something is seriously 
> hosed.
> 
> But I guess I don't need to tell _you_ that :-)
> 
> Cheers,
> 
> Hannes

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 18/19] dm-crypt: check if adding pages to clone bio fails

2023-03-29 Thread Damien Le Moal
On 3/30/23 09:17, Yang Shi wrote:
> On Wed, Mar 29, 2023 at 4:49 PM Damien Le Moal
>  wrote:
>>
>> On 3/30/23 02:06, Johannes Thumshirn wrote:
>>> Check if adding pages to clone bio fails and if bail out.
>>
>> Nope. The code retries with direct reclaim until it succeeds. Which is very
>> suspicious...
> 
> It is not related to bio_add_page() failure. It is used to avoid a
> race condition when two processes are depleting the mempool
> simultaneously.
> 
> IIUC I don't think page merge may happen for dm-crypt, so is
> __bio_add_page() good enough? I'm working on this code too, using
> __bio_add_page() would make my patch easier.

If the BIO was allocated with enough bvecs, we could use that function. But page
merging reduces overhead, so if it can happen, let's use it.

> 
>>
>>>
>>> This way we can mark bio_add_pages as __must_check.
>>>
>>> Signed-off-by: Johannes Thumshirn 
>>
>> With the commit message fixed,
>>
>> Reviewed-by: Damien Le Moal 
>>
>>
>> --
>> Damien Le Moal
>> Western Digital Research
>>
>>

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 19/19] block: mark bio_add_page as __must_check

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:06, Johannes Thumshirn wrote:
> Now that all users of bio_add_page check for the return value, mark
> bio_add_page as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 18/19] dm-crypt: check if adding pages to clone bio fails

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:06, Johannes Thumshirn wrote:
> Check if adding pages to clone bio fails and if bail out.

Nope. The code retries with direct reclaim until it succeeds. Which is very
suspicious...

> 
> This way we can mark bio_add_pages as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

With the commit message fixed,

Reviewed-by: Damien Le Moal 


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 17/19] md: raid1: check if adding pages to resync bio fails

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:06, Johannes Thumshirn wrote:
> Check if adding pages to resync bio fails and if bail out.
> 
> As the comment above suggests this cannot happen, WARN if it actually
> happens.
> 
> This way we can mark bio_add_pages as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/md/raid1-10.c |  7 ++-
>  drivers/md/raid10.c   | 12 ++--
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> index e61f6cad4e08..c21b6c168751 100644
> --- a/drivers/md/raid1-10.c
> +++ b/drivers/md/raid1-10.c
> @@ -105,7 +105,12 @@ static void md_bio_reset_resync_pages(struct bio *bio, 
> struct resync_pages *rp,
>* won't fail because the vec table is big
>* enough to hold all these pages
>*/
> - bio_add_page(bio, page, len, 0);
> + if (WARN_ON(!bio_add_page(bio, page, len, 0))) {

Not sure we really need the WARN_ON here...
Nevertheless,

Reviewed-by: Damien Le Moal 


> + bio->bi_status = BLK_STS_RESOURCE;
> + bio_endio(bio);
> + return;
> + }
> +
>   size -= len;
>   } while (idx++ < RESYNC_PAGES && size > 0);
>  }
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 6c66357f92f5..5682dba52fd3 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3808,7 +3808,11 @@ static sector_t raid10_sync_request(struct mddev 
> *mddev, sector_t sector_nr,
>* won't fail because the vec table is big enough
>* to hold all these pages
>*/
> - bio_add_page(bio, page, len, 0);
> + if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
> + bio->bi_status = BLK_STS_RESOURCE;
> + bio_endio(bio);
> + goto giveup;
> + }
>   }
>   nr_sectors += len>>9;
>   sector_nr += len>>9;
> @@ -4989,7 +4993,11 @@ static sector_t reshape_request(struct mddev *mddev, 
> sector_t sector_nr,
>* won't fail because the vec table is big enough
>* to hold all these pages
>*/
> - bio_add_page(bio, page, len, 0);
> + if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
> + bio->bi_status = BLK_STS_RESOURCE;
> + bio_endio(bio);
> + return sectors_done; /* XXX: is this correct? */
> + }
>   }
>   sector_nr += len >> 9;
>   nr_sectors += len >> 9;

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 16/19] md: raid1: use __bio_add_page for adding single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:06, Johannes Thumshirn wrote:
> The sync request code uses bio_add_page() to add a page to a newly created 
> bio.
> bio_add_page() can fail, but the return value is never checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 15/19] md: check for failure when adding pages in alloc_behind_master_bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:06, Johannes Thumshirn wrote:
> alloc_behind_master_bio() can possibly add multiple pages to a bio, but it
> is not checking for the return value of bio_add_page() if adding really
> succeeded.
> 
> Check if the page adding succeeded and if not bail out.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 14/19] floppy: use __bio_add_page for adding single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:06, Johannes Thumshirn wrote:
> The floppy code uses bio_add_page() to add a page to a newly created bio.
> bio_add_page() can fail, but the return value is never checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 13/19] zram: use __bio_add_page for adding single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The zram writeback code uses bio_add_page() to add a page to a newly
> created bio. bio_add_page() can fail, but the return value is never
> checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 12/19] zonefs: use __bio_add_page for adding single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The zonefs superblock reading code uses bio_add_page() to add a page to a
> newly created bio. bio_add_page() can fail, but the return value is
> never checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Acked-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 11/19] gfs: use __bio_add_page for adding single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The GFS superblock reading code uses bio_add_page() to add a page to a
> newly created bio. bio_add_page() can fail, but the return value is never
> checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 10/19] jfs: logmgr: use __bio_add_page to add single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The JFS IO code uses bio_add_page() to add a page to a newly created bio.
> bio_add_page() can fail, but the return value is never checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 09/19] btrfs: raid56: use __bio_add_page to add single page

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The btrfs raid58 sector submission code uses bio_add_page() to add a page
> to a newly created bio. bio_add_page() can fail, but the return value is
> never checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 08/19] btrfs: repair: use __bio_add_page for adding single page

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The btrfs repair bio submission code uses bio_add_page() to add a page to
> a newly created bio. bio_add_page() can fail, but the return value is
> never checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 07/19] md: raid5: use __bio_add_page to add single page to new bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The raid5-ppl submission code uses bio_add_page() to add a page to a
> newly created bio. bio_add_page() can fail, but the return value is never
> checked. For adding consecutive pages, the return is actually checked and
> a new bio is allocated if adding the page fails.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 05/19] md: use __bio_add_page to add single page

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The md-raid superblock writing code uses bio_add_page() to add a page to a
> newly created bio. bio_add_page() can fail, but the return value is never
> checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-of_-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 06/19] md: raid5-log: use __bio_add_page to add single page

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The raid5 log metadata submission code uses bio_add_page() to add a page
> to a newly created bio. bio_add_page() can fail, but the return value is
> never checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 03/19] dm: dm-zoned: use __bio_add_page for adding single metadata page

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> dm-zoned uses bio_add_page() for adding a single page to a freshly created
> metadata bio.
> 
> Use __bio_add_page() instead as adding a single page to a new bio is
> always guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() __must_check
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 04/19] fs: buffer: use __bio_add_page to add single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The buffer_head submission code uses bio_add_page() to add a page to a
> newly created bio. bio_add_page() can fail, but the return value is never
> checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 02/19] drbd: use __bio_add_page to add page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The drbd code only adds a single page to a newly created bio. So use
> __bio_add_page() to add the page which is guaranteed to succeed in this
> case.
> 
> This brings us closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

With Matthew comment addressed,

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 01/19] swap: use __bio_add_page to add page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The swap code only adds a single page to a newly created bio. So use
> __bio_add_page() to add the page which is guaranteed to succeed in this
> case.
> 
> This brings us closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v8 1/9] block: Introduce queue limits for copy-offload support

2023-03-29 Thread Damien Le Moal
On 3/29/23 19:41, Nitesh Shetty wrote:
>>> +What:  /sys/block//queue/copy_max_bytes
>>> +Date:  November 2022
>>> +Contact:   linux-bl...@vger.kernel.org
>>> +Description:
>>> +   [RW] While 'copy_max_bytes_hw' is the hardware limit for the
>>> +   device, 'copy_max_bytes' setting is the software limit.
>>> +   Setting this value lower will make Linux issue smaller size
>>> +   copies from block layer.
>>
>>  This is the maximum number of bytes that the block
>> layer will allow for a copy request. Must be smaller than
>> or equal to the maximum size allowed by the hardware 
>> indicated
> 
> Looks good.  Will update in next version. We took reference from discard. 
> 
>>  by copy_max_bytes_hw. Write 0 to use the default kernel
>>  settings.
>>
> 
> Nack, writing 0 will not set it to default value. (default value is
> copy_max_bytes = copy_max_bytes_hw)

It is trivial to make it work that way, which would match how max_sectors_kb
works. Write 0 to return copy_max_bytes being equal to the default
copy_max_bytes_hw.

The other possibility that is also interesting is "write 0 to disable copy
offload and use emulation". This one may actually be more useful.

> 
>>> +
>>> +
>>> +What:  /sys/block//queue/copy_max_bytes_hw
>>> +Date:  November 2022
>>> +Contact:   linux-bl...@vger.kernel.org
>>> +Description:
>>> +   [RO] Devices that support offloading copy functionality may have
>>> +   internal limits on the number of bytes that can be offloaded
>>> +   in a single operation. The `copy_max_bytes_hw`
>>> +   parameter is set by the device driver to the maximum number of
>>> +   bytes that can be copied in a single operation. Copy
>>> +   requests issued to the device must not exceed this limit.
>>> +   A value of 0 means that the device does not
>>> +   support copy offload.
>>
>>  [RO] This is the maximum number of kilobytes supported in a
>> single data copy offload operation. A value of 0 means that 
>> the
>>  device does not support copy offload.
>>
> 
> Nack, value is in bytes. Same as discard.

Typo. I meant Bytes. Your text is too long an too convoluted, so unclear.

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v8 9/9] null_blk: add support for copy offload

2023-03-29 Thread Damien Le Moal
On 3/27/23 17:40, Anuj Gupta wrote:
> From: Nitesh Shetty 
> 
> Implementaion is based on existing read and write infrastructure.
> 
> Suggested-by: Damien Le Moal 
> Signed-off-by: Anuj Gupta 
> Signed-off-by: Nitesh Shetty 
> Signed-off-by: Vincent Fu 
> ---
>  drivers/block/null_blk/main.c | 94 +++
>  drivers/block/null_blk/null_blk.h |  7 +++
>  2 files changed, 101 insertions(+)
> 
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index 9e6b032c8ecc..84c5fbcd67a5 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -1257,6 +1257,81 @@ static int null_transfer(struct nullb *nullb, struct 
> page *page,
>   return err;
>  }
>  
> +static inline int nullb_setup_copy_read(struct nullb *nullb,
> + struct bio *bio)
> +{
> + struct nullb_copy_token *token = bvec_kmap_local(>bi_io_vec[0]);
> +
> + memcpy(token->subsys, "nullb", 5);
> + token->sector_in = bio->bi_iter.bi_sector;
> + token->nullb = nullb;
> + token->sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT;
> +
> + return 0;
> +}
> +
> +static inline int nullb_setup_copy_write(struct nullb *nullb,
> + struct bio *bio, bool is_fua)
> +{
> + struct nullb_copy_token *token = bvec_kmap_local(>bi_io_vec[0]);
> + sector_t sector_in, sector_out;
> + void *in, *out;
> + size_t rem, temp;
> + unsigned long offset_in, offset_out;
> + struct nullb_page *t_page_in, *t_page_out;
> + int ret = -EIO;
> +
> + if (unlikely(memcmp(token->subsys, "nullb", 5)))
> + return -EOPNOTSUPP;
> + if (unlikely(token->nullb != nullb))
> + return -EOPNOTSUPP;
> + if (WARN_ON(token->sectors != bio->bi_iter.bi_size >> SECTOR_SHIFT))
> + return -EOPNOTSUPP;

EOPNOTSUPP is strange. These are EINVAL, no ?.

> +
> + sector_in = token->sector_in;
> + sector_out = bio->bi_iter.bi_sector;
> + rem = token->sectors << SECTOR_SHIFT;
> +
> + spin_lock_irq(>lock);
> + while (rem > 0) {
> + temp = min_t(size_t, nullb->dev->blocksize, rem);
> + offset_in = (sector_in & SECTOR_MASK) << SECTOR_SHIFT;
> + offset_out = (sector_out & SECTOR_MASK) << SECTOR_SHIFT;
> +
> + if (null_cache_active(nullb) && !is_fua)
> + null_make_cache_space(nullb, PAGE_SIZE);
> +
> + t_page_in = null_lookup_page(nullb, sector_in, false,
> + !null_cache_active(nullb));
> + if (!t_page_in)
> + goto err;
> + t_page_out = null_insert_page(nullb, sector_out,
> + !null_cache_active(nullb) || is_fua);
> + if (!t_page_out)
> + goto err;
> +
> + in = kmap_local_page(t_page_in->page);
> + out = kmap_local_page(t_page_out->page);
> +
> + memcpy(out + offset_out, in + offset_in, temp);
> + kunmap_local(out);
> + kunmap_local(in);
> + __set_bit(sector_out & SECTOR_MASK, t_page_out->bitmap);
> +
> + if (is_fua)
> + null_free_sector(nullb, sector_out, true);
> +
> + rem -= temp;
> + sector_in += temp >> SECTOR_SHIFT;
> + sector_out += temp >> SECTOR_SHIFT;
> + }
> +
> + ret = 0;
> +err:
> + spin_unlock_irq(>lock);
> + return ret;
> +}
> +
>  static int null_handle_rq(struct nullb_cmd *cmd)
>  {
>   struct request *rq = cmd->rq;
> @@ -1267,6 +1342,14 @@ static int null_handle_rq(struct nullb_cmd *cmd)
>   struct req_iterator iter;
>   struct bio_vec bvec;
>  
> + if (rq->cmd_flags & REQ_COPY) {
> + if (op_is_write(req_op(rq)))
> + return nullb_setup_copy_write(nullb, rq->bio,
> + rq->cmd_flags & REQ_FUA);
> + else

No need for this else.

> + return nullb_setup_copy_read(nullb, rq->bio);
> + }
> +
>   spin_lock_irq(>lock);
>   rq_for_each_segment(bvec, rq, iter) {
>   len = bvec.bv_len;
> @@ -1294,6 +1377,14 @@ static int null_handle_bio(struct nullb_cmd *cmd)
>   struct bio_vec bvec;
>   struct bvec_iter iter;
>  
> + if (bio->bi_opf & REQ_COPY) {
> + if (op_is_write(bio_op(bio)))
> + return nullb_setup_copy_write(nullb, bio,

Re: [dm-devel] [PATCH v8 7/9] dm: Add support for copy offload.

2023-03-29 Thread Damien Le Moal
On 3/27/23 17:40, Anuj Gupta wrote:
> From: Nitesh Shetty 
> 

Drop the period at the end of the patch title.

> Before enabling copy for dm target, check if underlying devices and
> dm target support copy. Avoid split happening inside dm target.
> Fail early if the request needs split, currently splitting copy
> request is not supported.
> 
> Signed-off-by: Nitesh Shetty 
> ---
>  drivers/md/dm-table.c | 42 +++
>  drivers/md/dm.c   |  7 ++
>  include/linux/device-mapper.h |  5 +
>  3 files changed, 54 insertions(+)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 7899f5fb4c13..45e894b9e3be 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -1863,6 +1863,39 @@ static bool dm_table_supports_nowait(struct dm_table 
> *t)
>   return true;
>  }
>  
> +static int device_not_copy_capable(struct dm_target *ti, struct dm_dev *dev,
> +   sector_t start, sector_t len, void *data)
> +{
> + struct request_queue *q = bdev_get_queue(dev->bdev);
> +
> + return !blk_queue_copy(q);
> +}
> +
> +static bool dm_table_supports_copy(struct dm_table *t)
> +{
> + struct dm_target *ti;
> + unsigned int i;
> +
> + for (i = 0; i < t->num_targets; i++) {
> + ti = dm_table_get_target(t, i);
> +
> + if (!ti->copy_offload_supported)
> + return false;
> +
> + /*
> +  * target provides copy support (as implied by setting
> +  * 'copy_offload_supported')
> +  * and it relies on _all_ data devices having copy support.
> +  */
> + if (!ti->type->iterate_devices ||
> +  ti->type->iterate_devices(ti,
> +  device_not_copy_capable, NULL))
> + return false;
> + }
> +
> + return true;
> +}
> +
>  static int device_not_discard_capable(struct dm_target *ti, struct dm_dev 
> *dev,
> sector_t start, sector_t len, void *data)
>  {
> @@ -1945,6 +1978,15 @@ int dm_table_set_restrictions(struct dm_table *t, 
> struct request_queue *q,
>   q->limits.discard_misaligned = 0;
>   }
>  
> + if (!dm_table_supports_copy(t)) {
> + blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
> + /* Must also clear copy limits... */

Not a useful comment. The code is clear.

> + q->limits.max_copy_sectors = 0;
> + q->limits.max_copy_sectors_hw = 0;
> + } else {
> + blk_queue_flag_set(QUEUE_FLAG_COPY, q);
> + }
> +
>   if (!dm_table_supports_secure_erase(t))
>   q->limits.max_secure_erase_sectors = 0;
>  
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 2d0f934ba6e6..08ec51000af8 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1693,6 +1693,13 @@ static blk_status_t __split_and_process_bio(struct 
> clone_info *ci)
>   if (unlikely(ci->is_abnormal_io))
>   return __process_abnormal_io(ci, ti);
>  
> + if ((unlikely(op_is_copy(ci->bio->bi_opf)) &&
> + max_io_len(ti, ci->sector) < ci->sector_count)) {
> + DMERR("Error, IO size(%u) > max target size(%llu)\n",
> + ci->sector_count, max_io_len(ti, ci->sector));
> + return BLK_STS_IOERR;
> + }
> +
>   /*
>* Only support bio polling for normal IO, and the target io is
>* exactly inside the dm_io instance (verified in dm_poll_dm_io)
> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> index 7975483816e4..44969a20295e 100644
> --- a/include/linux/device-mapper.h
> +++ b/include/linux/device-mapper.h
> @@ -380,6 +380,11 @@ struct dm_target {
>* bio_set_dev(). NOTE: ideally a target should _not_ need this.
>*/
>   bool needs_bio_set_dev:1;
> +
> + /*
> +  * copy offload is supported
> +  */
> + bool copy_offload_supported:1;
>  };
>  
>  void *dm_per_bio_data(struct bio *bio, size_t data_size);

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v8 2/9] block: Add copy offload support infrastructure

2023-03-29 Thread Damien Le Moal
 completion of copy operation,
> + *   for synchronous operation this should be NULL
> + * @private: end_io function will be called with this private data, should be
> + *   NULL, if operation is synchronous in nature
> + * @gfp_mask:   memory allocation flags (for bio_alloc)
> + *
> + * Returns the length of bytes copied or a negative error value
> + *
> + * Description:
> + *   Copy source offset from source block device to destination block
> + *   device. length of a source range cannot be zero. Max total length of
> + *   copy is limited to MAX_COPY_TOTAL_LENGTH
> + */
> +int blkdev_issue_copy(struct block_device *bdev_in, loff_t pos_in,
> +   struct block_device *bdev_out, loff_t pos_out, size_t len,
> +   cio_iodone_t end_io, void *private, gfp_t gfp_mask)
> +{
> + struct request_queue *q_in = bdev_get_queue(bdev_in);
> + struct request_queue *q_out = bdev_get_queue(bdev_out);
> + int ret = -EINVAL;
> +
> + ret = blk_copy_sanity_check(bdev_in, pos_in, bdev_out, pos_out, len);
> + if (ret)
> + return ret;
> +
> + if (blk_check_copy_offload(q_in, q_out))
> + ret = __blk_copy_offload(bdev_in, pos_in, bdev_out, pos_out,
> +len, end_io, private, gfp_mask);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(blkdev_issue_copy);
> +
>  static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>   sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
>   struct bio **biop, unsigned flags)
> diff --git a/block/blk.h b/block/blk.h
> index d65d96994a94..684b8fa121db 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -311,6 +311,8 @@ static inline bool bio_may_exceed_limits(struct bio *bio,
>   break;
>   }
>  
> + if (unlikely(op_is_copy(bio->bi_opf)))
> + return false;
>   /*
>* All drivers must accept single-segments bios that are <= PAGE_SIZE.
>* This is a quick and dirty check that relies on the fact that
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index a0e339ff3d09..7f586c4b9954 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -423,6 +423,7 @@ enum req_flag_bits {
>*/
>   /* for REQ_OP_WRITE_ZEROES: */
>   __REQ_NOUNMAP,  /* do not free blocks when zeroing */
> + __REQ_COPY, /* copy request */
>  
>   __REQ_NR_BITS,  /* stops here */
>  };
> @@ -452,6 +453,7 @@ enum req_flag_bits {
>  
>  #define REQ_DRV  (__force blk_opf_t)(1ULL << __REQ_DRV)
>  #define REQ_SWAP (__force blk_opf_t)(1ULL << __REQ_SWAP)
> +#define REQ_COPY ((__force blk_opf_t)(1ULL << __REQ_COPY))
>  
>  #define REQ_FAILFAST_MASK \
>   (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
> @@ -478,6 +480,11 @@ static inline bool op_is_write(blk_opf_t op)
>   return !!(op & (__force blk_opf_t)1);
>  }
>  
> +static inline bool op_is_copy(blk_opf_t op)
> +{
> + return (op & REQ_COPY);

No need for the parenthesis.

> +}
> +
>  /*
>   * Check if the bio or request is one that needs special treatment in the
>   * flush state machine.
> @@ -537,4 +544,22 @@ struct blk_rq_stat {
>   u64 batch;
>  };
>  
> +typedef void (cio_iodone_t)(void *private, int comp_len);

Not really needed I think.

> +
> +struct cio {
> + struct task_struct *waiter; /* waiting task (NULL if none) */
> + atomic_t refcount;
> + loff_t pos_in;
> + loff_t pos_out;
> + size_t comp_len;
> + cio_iodone_t *endio;/* applicable for async operation */
> + void *private;  /* applicable for async operation */
> +};
> +
> +struct copy_ctx {
> + struct cio *cio;
> + struct work_struct dispatch_work;
> + struct bio *write_bio;
> +};
> +
>  #endif /* __LINUX_BLK_TYPES_H */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 200338f2ec2e..1bb43697d43d 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1054,6 +1054,9 @@ int __blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>   sector_t nr_sects, gfp_t gfp_mask, struct bio **biop);
>  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
>   sector_t nr_sects, gfp_t gfp);
> +int blkdev_issue_copy(struct block_device *bdev_in, loff_t pos_in,
> +   struct block_device *bdev_out, loff_t pos_out, size_t len,
> +   cio_iodone_t end_io, void *private, gfp_t gfp_mask);
>  
>  #define BLKDEV_ZERO_NOUNMAP  (1 << 0)  /* do not free blocks */
>  #define BLKDEV_ZERO_NOFALLBACK   (1 << 1)  /* don't write explicit 
> zeroes */

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v8 1/9] block: Introduce queue limits for copy-offload support

2023-03-29 Thread Damien Le Moal
 a single copy payload
> + * @q:  the request queue for the device
> + * @max_copy_sectors: maximum number of sectors to copy
> + **/
> +void blk_queue_max_copy_sectors_hw(struct request_queue *q,
> + unsigned int max_copy_sectors)
> +{
> + if (max_copy_sectors >= MAX_COPY_TOTAL_LENGTH)

Confusing name as LENGTH may be interpreted as bytes. MAX_COPY_SECTORS would be
better.

> + max_copy_sectors = MAX_COPY_TOTAL_LENGTH;
> +
> + q->limits.max_copy_sectors_hw = max_copy_sectors;
> + q->limits.max_copy_sectors = max_copy_sectors;
> +}
> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors_hw);


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 4/7] zonefs: use bdev_zone_no helper to calculate the zone index

2023-01-06 Thread Damien Le Moal
On 1/6/23 17:33, Pankaj Raghav wrote:
> Use bdev_zone_no() helper instead of opencoding the logic to find the
> zone index.
> 
> Signed-off-by: Pankaj Raghav 
> ---
>  fs/zonefs/super.c  | 8 +++-
>  fs/zonefs/zonefs.h | 1 -
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 2c53fbb8d918..c2ddc62e99dc 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -487,7 +487,6 @@ static void __zonefs_io_error(struct inode *inode, bool 
> write)
>  {
>   struct zonefs_inode_info *zi = ZONEFS_I(inode);
>   struct super_block *sb = inode->i_sb;
> - struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
>   unsigned int noio_flag;
>   unsigned int nr_zones = 1;
>   struct zonefs_ioerr_data err = {
> @@ -502,8 +501,8 @@ static void __zonefs_io_error(struct inode *inode, bool 
> write)
>* size is always larger than the device zone size.
>*/
>   if (zi->i_zone_size > bdev_zone_sectors(sb->s_bdev))
> - nr_zones = zi->i_zone_size >>
> - (sbi->s_zone_sectors_shift + SECTOR_SHIFT);
> + nr_zones =
> + bdev_zone_no(sb->s_bdev, zi->i_zone_size >> 
> SECTOR_SHIFT);

This is a number of zones, not a zone number. So this at least needs a
comment to make clear. Otherwise, I find this confusing.

I would also prefer you hold on this patch. I am about to post a big
series reworking many things in zonefs. That will conflict.

>  
>   /*
>* Memory allocations in blkdev_report_zones() can trigger a memory
> @@ -1420,7 +1419,7 @@ static int zonefs_init_file_inode(struct inode *inode, 
> struct blk_zone *zone,
>   struct zonefs_inode_info *zi = ZONEFS_I(inode);
>   int ret = 0;
>  
> - inode->i_ino = zone->start >> sbi->s_zone_sectors_shift;
> + inode->i_ino = bdev_zone_no(sb->s_bdev, zone->start);
>   inode->i_mode = S_IFREG | sbi->s_perm;
>  
>   zi->i_ztype = type;
> @@ -1804,7 +1803,6 @@ static int zonefs_fill_super(struct super_block *sb, 
> void *data, int silent)
>* interface constraints.
>*/
>   sb_set_blocksize(sb, bdev_zone_write_granularity(sb->s_bdev));
> - sbi->s_zone_sectors_shift = ilog2(bdev_zone_sectors(sb->s_bdev));
>   sbi->s_uid = GLOBAL_ROOT_UID;
>   sbi->s_gid = GLOBAL_ROOT_GID;
>   sbi->s_perm = 0640;
> diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h
> index 1dbe78119ff1..703bc4505b29 100644
> --- a/fs/zonefs/zonefs.h
> +++ b/fs/zonefs/zonefs.h
> @@ -179,7 +179,6 @@ struct zonefs_sb_info {
>   kgid_t  s_gid;
>   umode_t s_perm;
>   uuid_t  s_uuid;
> - unsigned ints_zone_sectors_shift;
>  
>   unsigned ints_nr_files[ZONEFS_ZTYPE_MAX];
>  

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 3/7] nvmet: introduce bdev_zone_no helper

2023-01-06 Thread Damien Le Moal
On 1/6/23 17:33, Pankaj Raghav wrote:
> A generic bdev_zone_no() helper is added to calculate zone number for a
> given sector in a block device. This helper internally uses disk_zone_no()
> to find the zone number.
> 
> Use the helper bdev_zone_no() to calculate nr of zones. This let's us
> make modifications to the math if needed in one place.
> 
> Signed-off-by: Pankaj Raghav 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 1/7] block: remove superfluous check for request queue in bdev_is_zoned

2023-01-06 Thread Damien Le Moal
On 1/6/23 17:33, Pankaj Raghav wrote:
> Remove the superfluous request queue check in bdev_is_zoned() as the
> bdev_get_queue can never return NULL.
> 
> Signed-off-by: Pankaj Raghav 
> ---
>  include/linux/blkdev.h | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 43d4e073b111..0e40b014c40b 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1283,12 +1283,7 @@ static inline enum blk_zoned_model 
> bdev_zoned_model(struct block_device *bdev)
>  
>  static inline bool bdev_is_zoned(struct block_device *bdev)
>  {
> - struct request_queue *q = bdev_get_queue(bdev);
> -
> - if (q)
> - return blk_queue_is_zoned(q);
> -
> - return false;
> + return blk_queue_is_zoned(bdev_get_queue(bdev));
>  }
>  
>  static inline bool bdev_op_is_zoned_write(struct block_device *bdev,

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 2/7] block: add a new helper bdev_{is_zone_start, offset_from_zone_start}

2023-01-06 Thread Damien Le Moal
On 1/6/23 17:33, Pankaj Raghav wrote:
> Instead of open coding to check for zone start, add a helper to improve
> readability and store the logic in one place.
> 
> bdev_offset_from_zone_start() will be used later in the series.
> 
> Signed-off-by: Pankaj Raghav 
> ---
>  block/blk-core.c   |  2 +-
>  block/blk-zoned.c  |  4 ++--
>  include/linux/blkdev.h | 18 ++
>  3 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 9321767470dc..0405b3144e7a 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -573,7 +573,7 @@ static inline blk_status_t blk_check_zone_append(struct 
> request_queue *q,
>   return BLK_STS_NOTSUPP;
>  
>   /* The bio sector must point to the start of a sequential zone */
> - if (bio->bi_iter.bi_sector & (bdev_zone_sectors(bio->bi_bdev) - 1) ||
> + if (!bdev_is_zone_start(bio->bi_bdev, bio->bi_iter.bi_sector) ||
>   !bio_zone_is_seq(bio))
>   return BLK_STS_IOERR;
>  
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index db829401d8d0..614b575be899 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -277,10 +277,10 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum 
> req_op op,
>   return -EINVAL;
>  
>   /* Check alignment (handle eventual smaller last zone) */
> - if (sector & (zone_sectors - 1))
> + if (!bdev_is_zone_start(bdev, sector))
>   return -EINVAL;
>  
> - if ((nr_sectors & (zone_sectors - 1)) && end_sector != capacity)
> + if (!bdev_is_zone_start(bdev, nr_sectors) && end_sector != capacity)
>   return -EINVAL;
>  
>   /*
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 0e40b014c40b..04b7cbfd7a2a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -715,6 +715,7 @@ static inline unsigned int disk_zone_no(struct gendisk 
> *disk, sector_t sector)
>  {
>   return 0;
>  }
> +

whiteline change

>  static inline unsigned int bdev_max_open_zones(struct block_device *bdev)
>  {
>   return 0;
> @@ -1304,6 +1305,23 @@ static inline sector_t bdev_zone_sectors(struct 
> block_device *bdev)
>   return q->limits.chunk_sectors;
>  }
>  
> +static inline sector_t bdev_offset_from_zone_start(struct block_device *bdev,
> +sector_t sec)
> +{
> + if (!bdev_is_zoned(bdev))

this helper should never be called outside of code supporting zones. So
why this check ?

> + return 0;
> +
> + return sec & (bdev_zone_sectors(bdev) - 1);
> +}
> +
> +static inline bool bdev_is_zone_start(struct block_device *bdev, sector_t 
> sec)
> +{
> + if (!bdev_is_zoned(bdev))
> + return false;

Same here.

> +
> + return bdev_offset_from_zone_start(bdev, sec) == 0;
> +}
> +
>  static inline int queue_dma_alignment(const struct request_queue *q)
>  {
>   return q ? q->limits.dma_alignment : 511;

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v5 10/10] fs: add support for copy file range in zonefs

2022-11-30 Thread Damien Le Moal
On 11/30/22 13:17, Nitesh Shetty wrote:
> On Wed, Nov 30, 2022 at 08:45:55AM +0900, Damien Le Moal wrote:
>> On 11/29/22 21:22, Nitesh Shetty wrote:
>>> Acked. I do see a gap in current zonefs cfr implementation. I will drop this
>>
>> cfr ?
>>
> 
> yes, will drop zonefs cfr for next version.

I meant: I do not understand "cfr". I now realize that it probably means
copy-file-range ? Please be clear and do not use abbreviations.

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v5 10/10] fs: add support for copy file range in zonefs

2022-11-29 Thread Damien Le Moal
On 11/29/22 21:22, Nitesh Shetty wrote:
> Acked. I do see a gap in current zonefs cfr implementation. I will drop this

cfr ?

> implementation for next version. Once we finalize on block copy offload
> implementation, will re-pick this and send with above reviews fixed.
> 
> Thank you,
> Nitesh

Please trim your replies.


-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v5 10/10] fs: add support for copy file range in zonefs

2022-11-23 Thread Damien Le Moal
On 11/24/22 10:32, Damien Le Moal wrote:
> On 11/23/22 14:58, Nitesh Shetty wrote:
>> copy_file_range is implemented using copy offload,
>> copy offloading to device is always enabled.
>> To disable copy offloading mount with "no_copy_offload" mount option.
> 
> And were is the code that handle this option ?
> 
>> At present copy offload is only used, if the source and destination files
>> are on same block device, otherwise copy file range is completed by
>> generic copy file range.
>>
>> copy file range implemented as following:
>>  - write pending writes on the src and dest files
>>  - drop page cache for dest file if its conv zone
>>  - copy the range using offload
>>  - update dest file info
>>
>> For all failure cases we fallback to generic file copy range
> 
> For all cases ? That would be weird. What would be the point of trying to
> copy again if e.g. the dest zone has gone offline or read only ?
> 
>> At present this implementation does not support conv aggregation
> 
> Please check this commit message overall: the grammar and punctuation
> could really be improved.
> 
>>
>> Signed-off-by: Nitesh Shetty 
>> Signed-off-by: Anuj Gupta 
>> ---
>>  fs/zonefs/super.c | 179 ++
>>  1 file changed, 179 insertions(+)
>>
>> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
>> index abc9a85106f2..15613433d4ae 100644
>> --- a/fs/zonefs/super.c
>> +++ b/fs/zonefs/super.c
>> @@ -1223,6 +1223,183 @@ static int zonefs_file_release(struct inode *inode, 
>> struct file *file)
>>  return 0;
>>  }
>>  
>> +static int zonefs_is_file_copy_offset_ok(struct inode *src_inode,
>> +struct inode *dst_inode, loff_t src_off, loff_t dst_off,
>> +size_t *len)
>> +{
>> +loff_t size, endoff;
>> +struct zonefs_inode_info *dst_zi = ZONEFS_I(dst_inode);
>> +
>> +inode_lock(src_inode);
>> +size = i_size_read(src_inode);
>> +inode_unlock(src_inode);
>> +/* Don't copy beyond source file EOF. */
>> +if (src_off < size) {
>> +if (src_off + *len > size)
>> +*len = (size - (src_off + *len));
>> +} else
>> +*len = 0;
> 
> Missing curly brackets for the else.
> 
>> +
>> +mutex_lock(_zi->i_truncate_mutex);
>> +if (dst_zi->i_ztype == ZONEFS_ZTYPE_SEQ) {
>> +if (*len > dst_zi->i_max_size - dst_zi->i_wpoffset)
>> +*len -= dst_zi->i_max_size - dst_zi->i_wpoffset;
>> +
>> +if (dst_off != dst_zi->i_wpoffset)
>> +goto err;
>> +}
>> +mutex_unlock(_zi->i_truncate_mutex);
>> +
>> +endoff = dst_off + *len;
>> +inode_lock(dst_inode);
>> +if (endoff > dst_zi->i_max_size ||
>> +inode_newsize_ok(dst_inode, endoff)) {
>> +inode_unlock(dst_inode);
>> +goto err;
> 
> And here truncate mutex is not locked, but goto err will unlock it. This
> is broken...
> 
>> +}
>> +inode_unlock(dst_inode);
> 
> ...The locking is completely broken in this function anyway. You take the
> lock, look at something, then release the lock. Then what if a write or a
> trunctate comes in when the inode is not locked ? This is completely
> broken. The inode should be locked with no dio pending when this function
> is called. This is only to check if everything is ok. This has no business
> playing with the inode and truncate locks.
> 
>> +
>> +return 0;
>> +err:
>> +mutex_unlock(_zi->i_truncate_mutex);
>> +return -EINVAL;
>> +}
>> +
>> +static ssize_t zonefs_issue_copy(struct zonefs_inode_info *src_zi,
>> +loff_t src_off, struct zonefs_inode_info *dst_zi,
>> +loff_t dst_off, size_t len)
>> +{
>> +struct block_device *src_bdev = src_zi->i_vnode.i_sb->s_bdev;
>> +struct block_device *dst_bdev = dst_zi->i_vnode.i_sb->s_bdev;
>> +struct range_entry *rlist = NULL;
>> +int ret = len;
>> +
>> +rlist = kmalloc(sizeof(*rlist), GFP_KERNEL);
> 
> GFP_NOIO ?
> 
>> +if (!rlist)
>> +return -ENOMEM;
>> +
>> +rlist[0].dst = (dst_zi->i_zsector << SECTOR_SHIFT) + dst_off;
>> +rlist[0].src = (src_zi->i_zsector << SECTOR_SHIFT) + src_off;
>> +rlist[0].len = len;
>> +rlist[0].comp_len = 0;
>> +  

Re: [dm-devel] [PATCH v5 10/10] fs: add support for copy file range in zonefs

2022-11-23 Thread Damien Le Moal
oad ? What if the device
does not support offloading ?

> + if (ret > 0)
> + ret = zonefs_copy_file(src_file, src_off, dst_file, dst_off,
> +  len, flags);

return here, then no need for the else. But see above. This seems all
broken to me.

> + else if (ret < 0 && ret == -EXDEV)
> + ret = generic_copy_file_range(src_file, src_off, dst_file,
> +   dst_off, len, flags);
> + return ret;
> +}
> +
>  static const struct file_operations zonefs_file_operations = {
>   .open       = zonefs_file_open,
>   .release= zonefs_file_release,
> @@ -1234,6 +1411,7 @@ static const struct file_operations 
> zonefs_file_operations = {
>   .splice_read= generic_file_splice_read,
>   .splice_write   = iter_file_splice_write,
>   .iopoll = iocb_bio_iopoll,
> + .copy_file_range = zonefs_copy_file_range,
>  };
>  
>  static struct kmem_cache *zonefs_inode_cachep;
> @@ -1804,6 +1982,7 @@ static int zonefs_fill_super(struct super_block *sb, 
> void *data, int silent)
>   atomic_set(>s_active_seq_files, 0);
>   sbi->s_max_active_seq_files = bdev_max_active_zones(sb->s_bdev);
>  
> + /* set copy support by default */

What is this comment supposed to be for ?

>   ret = zonefs_read_super(sb);
>   if (ret)
>   return ret;

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v15 00/13] support zoned block devices with non-power-of-2 zone sizes

2022-09-30 Thread Damien Le Moal
On 10/1/22 04:38, Bart Van Assche wrote:
> On 9/30/22 08:13, Jens Axboe wrote:
>> On 9/29/22 12:31 AM, Pankaj Raghav wrote:
>>>> Hi Jens,
>>>>Please consider this patch series for the 6.1 release.
>>>>
>>>
>>> Hi Jens, Christoph, and Keith,
>>>   All the patches have a Reviewed-by tag at this point. Can we queue this up
>>> for 6.1?
>>
>> It's getting pretty late for 6.1 and I'd really like to have both Christoph
>> and Martin sign off on these changes.
> 
> Hi Jens,
> 
> Agreed that it's getting late for 6.1.
> 
> Since this has not been mentioned in the cover letter, I want to add 
> that in the near future we will need these patches for Android devices. 
> JEDEC is working on supporting zoned storage for UFS devices, the 
> storage devices used in all modern Android phones. Although it would be 
> possible to make the offset between zone starts a power of two by 
> inserting gap zones between data zones, UFS vendors asked not to do this 
> and hence need support for zone sizes that are not a power of two. An 
> advantage of not having to deal with gap zones is better filesystem 
> performance since filesystem extents cannot span gap zones. Having to 
> split filesystem extents because of gap zones reduces filesystem 
> performance.

As mentioned many times, my opinion is that a good implementation should
*not* have any extent span zone boundaries. So personally, I do not
consider such argument as a valid justification for the non-power-of-2
zone size support.

> 
> Thanks,
> 
> Bart.
> 
> 

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] Please further explain Linux's "zoned storage" roadmap [was: Re: [PATCH v14 00/13] support zoned block devices with non-power-of-2 zone sizes]

2022-09-22 Thread Damien Le Moal
On 9/23/22 04:37, Mike Snitzer wrote:
> On Wed, Sep 21 2022 at  7:55P -0400,
> Damien Le Moal  wrote:
> 
>> On 9/22/22 02:27, Mike Snitzer wrote:
>>> On Tue, Sep 20 2022 at  5:11P -0400,
>>> Pankaj Raghav  wrote:
>>>
>>>> - Background and Motivation:
>>>>
>>>> The zone storage implementation in Linux, introduced since v4.10, first
>>>> targetted SMR drives which have a power of 2 (po2) zone size alignment
>>>> requirement. The po2 zone size was further imposed implicitly by the
>>>> block layer's blk_queue_chunk_sectors(), used to prevent IO merging
>>>> across chunks beyond the specified size, since v3.16 through commit
>>>> 762380ad9322 ("block: add notion of a chunk size for request merging").
>>>> But this same general block layer po2 requirement for 
>>>> blk_queue_chunk_sectors()
>>>> was removed on v5.10 through commit 07d098e6bbad ("block: allow 
>>>> 'chunk_sectors'
>>>> to be non-power-of-2").
>>>>
>>>> NAND, which is the media used in newer zoned storage devices, does not
>>>> naturally align to po2. In these devices, zone capacity(cap) is not the
>>>> same as the po2 zone size. When the zone cap != zone size, then unmapped
>>>> LBAs are introduced to cover the space between the zone cap and zone size.
>>>> po2 requirement does not make sense for these type of zone storage devices.
>>>> This patch series aims to remove these unmapped LBAs for zoned devices when
>>>> zone cap is npo2. This is done by relaxing the po2 zone size constraint
>>>> in the kernel and allowing zoned device with npo2 zone sizes if zone cap
>>>> == zone size.
>>>>
>>>> Removing the po2 requirement from zone storage should be possible
>>>> now provided that no userspace regression and no performance regressions 
>>>> are
>>>> introduced. Stop-gap patches have been already merged into f2fs-tools to
>>>> proactively not allow npo2 zone sizes until proper support is added [1].
>>>>
>>>> There were two efforts previously to add support to npo2 devices: 1) via
>>>> device level emulation [2] but that was rejected with a final conclusion
>>>> to add support for non po2 zoned device in the complete stack[3] 2)
>>>> adding support to the complete stack by removing the constraint in the
>>>> block layer and NVMe layer with support to btrfs, zonefs, etc which was
>>>> rejected with a conclusion to add a dm target for FS support [0]
>>>> to reduce the regression impact.
>>>>
>>>> This series adds support to npo2 zoned devices in the block and nvme
>>>> layer and a new **dm target** is added: dm-po2zoned-target. This new
>>>> target will be initially used for filesystems such as btrfs and
>>>> f2fs until native npo2 zone support is added.
>>>
>>> As this patchset nears the point of being "ready for merge" and DM's
>>> "zoned" oriented targets are multiplying, I need to understand: where
>>> are we collectively going?  How long are we expecting to support the
>>> "stop-gap zoned storage" layers we've constructed?
>>>
>>> I know https://zonedstorage.io/docs/introduction exists... but it
>>> _seems_ stale given the emergence of ZNS and new permutations of zoned
>>> hardware. Maybe that isn't quite fair (it does cover A LOT!) but I'm
>>> still left wanting (e.g. "bring it all home for me!")...
>>>
>>> Damien, as the most "zoned storage" oriented engineer I know, can you
>>> please kick things off by shedding light on where Linux is now, and
>>> where it's going, for "zoned storage"?
>>
>> Let me first start with what we have seen so far with deployments in the
>> field.
> 
> 
> 
> Thanks for all your insights on zoned storage, very appreciated!
> 
>>> In addition, it was my understanding that WDC had yet another zoned DM
>>> target called "dm-zap" that is for ZNS based devices... It's all a bit
>>> messy in my head (that's on me for not keeping up, but I think we need
>>> a recap!)
>>
>> Since the ZNS specification does not define conventional zones, dm-zoned
>> cannot be used as a standalone DM target (read: single block device) with
>> NVMe zoned block devices. Furthermore, due to its block mapping scheme,
>> dm-zoned does not support devices with zones that have a capacity lower
>

Re: [dm-devel] Please further explain Linux's "zoned storage" roadmap [was: Re: [PATCH v14 00/13] support zoned block devices with non-power-of-2 zone sizes]

2022-09-21 Thread Damien Le Moal
another zoned DM
> target called "dm-zap" that is for ZNS based devices... It's all a bit
> messy in my head (that's on me for not keeping up, but I think we need
> a recap!)

Since the ZNS specification does not define conventional zones, dm-zoned
cannot be used as a standalone DM target (read: single block device) with
NVMe zoned block devices. Furthermore, due to its block mapping scheme,
dm-zoned does not support devices with zones that have a capacity lower
than the zone size. So ZNS is really a big *no* for dm-zoned. dm-zap is a
prototype and in a nutshell is the equivalent of dm-zoned for ZNS. dm-zap
can deal with the smaller zone capacity and does not require conventional
zones. We are not trying to push for dm-zap to be merged for now as we are
still evaluating its potential use cases. We also have a different but
functionally equivalent approach implemented as a block device driver that
we are evaluating internally.

Given the above mentioned usage pattern we have seen so far for zoned
storage, it is not yet clear if something like dm-zap for ZNS is needed
beside some niche use cases.

> So please help me, and others, become more informed as quickly as
> possible! ;)

I hope the above helps. If you want me to develop further any of the
points above, feel free to let me know.

> ps. I'm asking all this in the open on various Linux mailing lists
> because it doesn't seem right to request a concall to inform only
> me... I think others may need similar "zoned storage" help.

All good with me :)

-- 
Damien Le Moal
Western Digital Research

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 1/1] dm crypt: Add inline encryption support

2022-01-14 Thread Damien Le Moal



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH] dm: fix dm_revalidate_zones() memory allocation

2021-06-18 Thread Damien Le Moal
Make sure that the zone write pointer offset array is allocated with a
vmalloc in dm_zone_revalidate_cb() by passing GFP_KERNEL gfp flag to
kvcalloc(). However, since we do not want to trigger IOs while
revalidating zones, change dm_revalidate_zones() to have the zone scan
done in GFP_NOIO context using memalloc_noio_save/restore calls.

Reported-by: Dan Carpenter 
Fixes: bb37d77239af ("dm: introduce zone append emulation")
Signed-off-by: Damien Le Moal 
---
 drivers/md/dm-zone.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
index c2f26949f5ee..6d82a34438c8 100644
--- a/drivers/md/dm-zone.c
+++ b/drivers/md/dm-zone.c
@@ -205,7 +205,7 @@ static int dm_zone_revalidate_cb(struct blk_zone *zone, 
unsigned int idx,
if (!md->zwp_offset) {
md->zwp_offset =
kvcalloc(q->nr_zones, sizeof(unsigned int),
-GFP_NOIO);
+GFP_KERNEL);
if (!md->zwp_offset)
return -ENOMEM;
}
@@ -230,6 +230,7 @@ static int dm_zone_revalidate_cb(struct blk_zone *zone, 
unsigned int idx,
 static int dm_revalidate_zones(struct mapped_device *md, struct dm_table *t)
 {
struct request_queue *q = md->queue;
+   unsigned int noio_flag;
int ret;
 
/*
@@ -241,9 +242,14 @@ static int dm_revalidate_zones(struct mapped_device *md, 
struct dm_table *t)
if (md->nr_zones)
return 0;
 
-   /* Scan all zones to initialize everything */
+   /*
+* Scan all zones to initialize everything. Ensure that all vmalloc
+* operations in this context are done as if GFP_NOIO was specified.
+*/
+   noio_flag = memalloc_noio_save();
ret = dm_blk_do_report_zones(md, t, 0, q->nr_zones,
 dm_zone_revalidate_cb, md);
+   memalloc_noio_restore(noio_flag);
if (ret < 0)
goto err;
if (ret != q->nr_zones) {
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v5 08/11] dm: Forbid requeue of writes to zones

2021-06-04 Thread Damien Le Moal
On 2021/06/04 23:56, Mike Snitzer wrote:
> On Tue, May 25 2021 at  5:24P -0400,
> Damien Le Moal  wrote:
> 
>> A target map method requesting the requeue of a bio with
>> DM_MAPIO_REQUEUE or completing it with DM_ENDIO_REQUEUE can cause
>> unaligned write errors if the bio is a write operation targeting a
>> sequential zone. If a zoned target request such a requeue, warn about
>> it and kill the IO.
>>
>> The function dm_is_zone_write() is introduced to detect write operations
>> to zoned targets.
>>
>> This change does not affect the target drivers supporting zoned devices
>> and exposing a zoned device, namely dm-crypt, dm-linear and dm-flakey as
>> none of these targets ever request a requeue.
>>
>> Signed-off-by: Damien Le Moal 
>> Reviewed-by: Hannes Reinecke 
>> Reviewed-by: Himanshu Madhani 
>> ---
>>  drivers/md/dm-zone.c | 17 +
>>  drivers/md/dm.c  | 18 +++---
>>  drivers/md/dm.h  |  5 +
>>  3 files changed, 37 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
>> index b42474043249..edc3bbb45637 100644
>> --- a/drivers/md/dm-zone.c
>> +++ b/drivers/md/dm-zone.c
>> @@ -104,6 +104,23 @@ int dm_report_zones(struct block_device *bdev, sector_t 
>> start, sector_t sector,
>>  }
>>  EXPORT_SYMBOL_GPL(dm_report_zones);
>>  
>> +bool dm_is_zone_write(struct mapped_device *md, struct bio *bio)
>> +{
>> +struct request_queue *q = md->queue;
>> +
>> +if (!blk_queue_is_zoned(q))
>> +return false;
>> +
>> +switch (bio_op(bio)) {
>> +case REQ_OP_WRITE_ZEROES:
>> +case REQ_OP_WRITE_SAME:
>> +case REQ_OP_WRITE:
>> +return !op_is_flush(bio->bi_opf) && bio_sectors(bio);
>> +default:
>> +return false;
>> +}
>> +}
>> +
>>  void dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q)
>>  {
>>  if (!blk_queue_is_zoned(q))
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index c49976cc4e44..ed8c5a8df2e5 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -846,11 +846,15 @@ static void dec_pending(struct dm_io *io, blk_status_t 
>> error)
>>   * Target requested pushing back the I/O.
>>   */
>>  spin_lock_irqsave(>deferred_lock, flags);
>> -if (__noflush_suspending(md))
>> +if (__noflush_suspending(md) &&
>> +!WARN_ON_ONCE(dm_is_zone_write(md, bio)))
>>  /* NOTE early return due to BLK_STS_DM_REQUEUE 
>> below */
>>  bio_list_add_head(>deferred, io->orig_bio);
>>  else
>> -/* noflush suspend was interrupted. */
>> +/*
>> + * noflush suspend was interrupted or this is
>> + * a write to a zoned target.
>> + */
>>  io->status = BLK_STS_IOERR;
>>  spin_unlock_irqrestore(>deferred_lock, flags);
>>  }
> 
> So I now see this incremental fix:
> https://patchwork.kernel.org/project/dm-devel/patch/20210604004703.408562-1-damien.lem...@opensource.wdc.com/
> 
> And I've folded it in...

Thanks.

>> @@ -947,7 +951,15 @@ static void clone_endio(struct bio *bio)
>>  int r = endio(tio->ti, bio, );
>>  switch (r) {
>>  case DM_ENDIO_REQUEUE:
>> -error = BLK_STS_DM_REQUEUE;
>> +/*
>> + * Requeuing writes to a sequential zone of a zoned
>> + * target will break the sequential write pattern:
>> + * fail such IO.
>> + */
>> +if (WARN_ON_ONCE(dm_is_zone_write(md, bio)))
>> +error = BLK_STS_IOERR;
>> +else
>> +error = BLK_STS_DM_REQUEUE;
>>  fallthrough;
>>  case DM_ENDIO_DONE:
>>  break;
> 
> But I'm left wondering why dec_pending, now dm_io_dec_pending, needs
> to be modified to also check dm_is_zone_write() if clone_endio() is
> already dealing with it?

The way I understand the code is that if the target ->map_bio() method returns
DM_MAPIO_REQ

[dm-devel] [PATCH] dm: fix uninitialized variable use

2021-06-04 Thread Damien Le Moal
From: Damien Le Moal 

In dm_io_dec_pending(), make sure that the bio variable is initialized
to io->orig_bio before using it.

Reported-by: kernel test robot 
Fixes: 3a1e343c53ae ("dm: Forbid requeue of writes to zones")
Signed-off-by: Damien Le Moal 
---
 drivers/md/dm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index a89bb9bbe82d..e687d60212b6 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -796,6 +796,7 @@ void dm_io_dec_pending(struct dm_io *io, blk_status_t error)
}
 
if (atomic_dec_and_test(>io_count)) {
+   bio = io->orig_bio;
if (io->status == BLK_STS_DM_REQUEUE) {
/*
 * Target requested pushing back the I/O.
@@ -804,7 +805,7 @@ void dm_io_dec_pending(struct dm_io *io, blk_status_t error)
if (__noflush_suspending(md) &&
!WARN_ON_ONCE(dm_is_zone_write(md, bio)))
/* NOTE early return due to BLK_STS_DM_REQUEUE 
below */
-   bio_list_add_head(>deferred, io->orig_bio);
+   bio_list_add_head(>deferred, bio);
else
/*
 * noflush suspend was interrupted or this is
@@ -815,7 +816,6 @@ void dm_io_dec_pending(struct dm_io *io, blk_status_t error)
}
 
io_error = io->status;
-   bio = io->orig_bio;
end_io_acct(io);
free_io(md, io);
 
-- 
2.31.1


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] dm: Forbid requeue of writes to zones

2021-06-03 Thread Damien Le Moal
On 2021/06/04 7:17, Colin Ian King wrote:
> Hi,
> 
> Static analysis with Coverity on Linux next has found and issue in
> drivers/md/dm.c with the following commit:
> 
> commit 2c243153d1d4be4e23735cd10984ac17c7a54531
> Author: Damien Le Moal 
> Date:   Wed May 26 06:24:58 2021 +0900
> 
> dm: Forbid requeue of writes to zones
> 
> The analysis is as follows:
> 
>  828 static void dec_pending(struct dm_io *io, blk_status_t error)
>  829 {
>  830unsigned long flags;
>  831blk_status_t io_error;
> 
> 1. var_decl: Declaring variable bio without initializer.

This one is related to #7.

> 
>  832struct bio *bio;
>  833struct mapped_device *md = io->md;
>  834
>  835/* Push-back supersedes any I/O errors */
> 
> 2. Condition !!error, taking true branch.
> 
>  836if (unlikely(error)) {
>  837spin_lock_irqsave(>endio_lock, flags);
> 
> 3. Condition io->status == 11 /* (blk_status_t)11 */, taking false
> branch.
> 
>  838if (!(io->status == BLK_STS_DM_REQUEUE &&
> __noflush_suspending(md)))
>  839io->status = error;
>  840spin_unlock_irqrestore(>endio_lock, flags);
>  841}
>  842

My patch does not touch these hunks. They are as is. So that is not new.

> 
> 4. Condition atomic_dec_and_test(>io_count), taking true branch.
> 
>  843if (atomic_dec_and_test(>io_count)) {
> 
> 5. Condition io->status == 11 /* (blk_status_t)11 */, taking true
> branch.
> 
>  844if (io->status == BLK_STS_DM_REQUEUE) {
>  845/*
>  846 * Target requested pushing back the I/O.
>  847 */
>  848spin_lock_irqsave(>deferred_lock, flags);
> 
> 6. Condition __noflush_suspending(md), taking true branch.
> 
>  849if (__noflush_suspending(md) &&

I do not understand this one, nor #4.

> 
> Uninitialized pointer read
> 7. uninit_use_in_call: Using uninitialized value bio when calling
> dm_is_zone_write.
> 
>  850!WARN_ON_ONCE(dm_is_zone_write(md, bio)))
>  851/* NOTE early return due to
> BLK_STS_DM_REQUEUE below */
>  852bio_list_add_head(>deferred,
> io->orig_bio);

The kernel build robot signaled this one already. Will send an incremental patch
asap today.

> 
> The pointer bio is not initialized and yet is being used in the call to
> function dm_is_zone_write where pointer bio is being accessed. I'm not
> sure what the original intent was, but this looks incorrect.
> 
> Colin
> 


-- 
Damien Le Moal
Western Digital Research



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v5 00/11] dm: Improve zoned block device support

2021-06-03 Thread Damien Le Moal
On 2021/06/04 7:16, Mike Snitzer wrote:
> On Thu, Jun 03 2021 at  1:46P -0400,
> Jens Axboe  wrote:
> 
>> On 6/2/21 12:32 PM, Mike Snitzer wrote:
>>> On Tue, Jun 01 2021 at  6:57P -0400,
>>> Damien Le Moal  wrote:
>>>
>>>> On 2021/05/26 6:25, Damien Le Moal wrote:
>>>>> This series improve device mapper support for zoned block devices and
>>>>> of targets exposing a zoned device.
>>>>
>>>> Mike, Jens,
>>>>
>>>> Any feedback regarding this series ?
>>>>
>>>>>
>>>>> The first patch improve support for user requests to reset all zones of
>>>>> the target device. With the fix, such operation behave similarly to
>>>>> physical block devices implementation based on the single zone reset
>>>>> command with the ALL bit set.
>>>>>
>>>>> The following 2 patches are preparatory block layer patches.
>>>>>
>>>>> Patch 4 and 5 are 2 small fixes to DM core zoned block device support.
>>>>>
>>>>> Patch 6 reorganizes DM core code, moving conditionally defined zoned
>>>>> block device code into the new dm-zone.c file. This avoids sprinkly DM
>>>>> with zone related code defined under an #ifdef CONFIG_BLK_DEV_ZONED.
>>>>>
>>>>> Patch 7 improves DM zone report helper functions for target drivers.
>>>>>
>>>>> Patch 8 fixes a potential problem with BIO requeue on zoned target.
>>>>>
>>>>> Finally, patch 9 to 11 implement zone append emulation using regular
>>>>> writes for target drivers that cannot natively support this BIO type.
>>>>> The only target currently needing this emulation is dm-crypt. With this
>>>>> change, a zoned dm-crypt device behaves exactly like a regular zoned
>>>>> block device, correctly executing user zone append BIOs.
>>>>>
>>>>> This series passes the following tests:
>>>>> 1) zonefs tests on top of dm-crypt with a zoned nullblk device
>>>>> 2) zonefs tests on top of dm-crypt+dm-linear with an SMR HDD
>>>>> 3) btrfs fstests on top of dm-crypt with zoned nullblk devices.
>>>>>
>>>>> Comments are as always welcome.
>>>
>>> I've picked up DM patches 4-8 because they didn't depend on the first
>>> 3 block patches.
>>>
>>> But I'm fine with picking up 1-3 if Jens provides his Acked-by.
>>> And then I can pickup the remaining DM patches 9-11.
>>
>> I'm fine with 1-3, you can add my Acked-by to those.
> 
> Thanks, did so.
> 
> Damien: I've staged this patchset in linux-next via the dm-5.14 branch of 
> linux-dm.git

Thanks !

> Might look at optimizing the fast-path of __map_bio further, e.g. this
> leaves something to be desired considering how niche this all is:
> 
> /*
>  * Check if the IO needs a special mapping due to zone append 
> emulation
>  * on zoned target. In this case, dm_zone_map_bio() calls the target
>  * map operation.
>  */
> if (dm_emulate_zone_append(io->md))
> r = dm_zone_map_bio(tio);
> else
> r = ti->type->map(ti, clone);
> 
> Does it make sense to split out a new CONFIG_ that encapsulates legacy
> zoned devices?

Well, the problem is that there are no "legacy" zoned devices: they all support
zone append commands, either natively (for zns and nullblk) or emulated in their
respective drivers (scsi sd for SMR drives). So I do not think that a new
CONFIG_XXX can be used.

What we could do is have this conditional on either:
(1) the DM targets that need it: dm-crypt only for now (CONFIG_DM_CRYPT)
(2) Zone append command users: btrfs and zonefs only for now (CONFIG_FS_BTRFS
and CONFIG_FS_ZONEFS)

(1) would be the nicest as we can keep this contained in DM and define something
in KConfig. However, given that most distros (if not all) enable dm-crypt, I am
not convinced this will do any good.

Note that for the !CONFIG_BLK_DEV_ZONED case, the "if
(dm_emulate_zone_append(io->md))" is compiled out, resulting in the same code as
without the emulation.

-- 
Damien Le Moal
Western Digital Research



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [dm:for-next 20/20] drivers/md/dm.c:850:43: warning: variable 'bio' is uninitialized when used here

2021-06-02 Thread Damien Le Moal
   }
>861
>862io_error = io->status;
>863bio = io->orig_bio;
>864end_io_acct(io);
>865free_io(md, io);
>866
>867if (io_error == BLK_STS_DM_REQUEUE)
>868return;
>869
>870if ((bio->bi_opf & REQ_PREFLUSH) && 
> bio->bi_iter.bi_size) {
>871/*
>872 * Preflush done for flush with data, 
> reissue
>873 * without REQ_PREFLUSH.
>874 */
>875bio->bi_opf &= ~REQ_PREFLUSH;
>876queue_io(md, bio);
>877} else {
>878    /* done with normal IO or empty flush */
>879if (io_error)
>880bio->bi_status = io_error;
>881bio_endio(bio);
>882}
>883}
>884}
>885
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
> 


-- 
Damien Le Moal
Western Digital Research



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v5 00/11] dm: Improve zoned block device support

2021-06-01 Thread Damien Le Moal
On 2021/05/26 6:25, Damien Le Moal wrote:
> This series improve device mapper support for zoned block devices and
> of targets exposing a zoned device.

Mike, Jens,

Any feedback regarding this series ?

> 
> The first patch improve support for user requests to reset all zones of
> the target device. With the fix, such operation behave similarly to
> physical block devices implementation based on the single zone reset
> command with the ALL bit set.
> 
> The following 2 patches are preparatory block layer patches.
> 
> Patch 4 and 5 are 2 small fixes to DM core zoned block device support.
> 
> Patch 6 reorganizes DM core code, moving conditionally defined zoned
> block device code into the new dm-zone.c file. This avoids sprinkly DM
> with zone related code defined under an #ifdef CONFIG_BLK_DEV_ZONED.
> 
> Patch 7 improves DM zone report helper functions for target drivers.
> 
> Patch 8 fixes a potential problem with BIO requeue on zoned target.
> 
> Finally, patch 9 to 11 implement zone append emulation using regular
> writes for target drivers that cannot natively support this BIO type.
> The only target currently needing this emulation is dm-crypt. With this
> change, a zoned dm-crypt device behaves exactly like a regular zoned
> block device, correctly executing user zone append BIOs.
> 
> This series passes the following tests:
> 1) zonefs tests on top of dm-crypt with a zoned nullblk device
> 2) zonefs tests on top of dm-crypt+dm-linear with an SMR HDD
> 3) btrfs fstests on top of dm-crypt with zoned nullblk devices.
> 
> Comments are as always welcome.
> 
> Changes from v4:
> * Remove useless extra space in variable initialization in patch 1
> * Shorten dm_accept_partial_bio() documentation comment in patch 4
> * Added reviewed-by tags
> 
> Changes from v3:
> * Fixed missing variable initialization in
>   blkdev_zone_reset_all_emulated() in patch 1.
> * Rebased on rc3
> * Added reviewed-by tags
> 
> Changes from v2:
> * Replace use of spinlock to protect the zone write pointer offset
>   array in patch 11 with READ_ONCE/WRITE_ONCE as suggested by Hannes.
> * Added reviewed-by tags
> 
> Changes from v1:
> * Use Christoph proposed approach for patch 1 (split reset all
>   processing into different functions)
> * Changed helpers introduced in patch 2 to remove the request_queue
>   argument
> * Improve patch 3 commit message as suggested by Christoph (explaining
>   that the flag is a special case that cannot use a REQ_XXX flag)
> * Changed DMWARN() into DMDEBUG in patch 11 as suggested by Milan
> * Added reviewed-by tags
> 
> Damien Le Moal (11):
>   block: improve handling of all zones reset operation
>   block: introduce bio zone helpers
>   block: introduce BIO_ZONE_WRITE_LOCKED bio flag
>   dm: Fix dm_accept_partial_bio()
>   dm: cleanup device_area_is_invalid()
>   dm: move zone related code to dm-zone.c
>   dm: Introduce dm_report_zones()
>   dm: Forbid requeue of writes to zones
>   dm: rearrange core declarations
>   dm: introduce zone append emulation
>   dm crypt: Fix zoned block device support
> 
>  block/blk-zoned.c | 119 +--
>  drivers/md/Makefile   |   4 +
>  drivers/md/dm-core.h  |  65 
>  drivers/md/dm-crypt.c |  31 +-
>  drivers/md/dm-flakey.c|   7 +-
>  drivers/md/dm-linear.c|   7 +-
>  drivers/md/dm-table.c |  21 +-
>  drivers/md/dm-zone.c  | 654 ++
>  drivers/md/dm.c   | 201 +++
>  drivers/md/dm.h   |  30 +-
>  include/linux/blk_types.h |   1 +
>  include/linux/blkdev.h|  12 +
>  include/linux/device-mapper.h |   9 +-
>  13 files changed, 954 insertions(+), 207 deletions(-)
>  create mode 100644 drivers/md/dm-zone.c
> 


-- 
Damien Le Moal
Western Digital Research



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH] dm zoned: check zone capacity

2021-05-28 Thread Damien Le Moal
On 2021/05/19 10:26, Damien Le Moal wrote:
> The dm-zoned target cannot support zoned block devices with zones that
> have a capacity smaller than the zone size (e.g. NVMe zoned namespaces)
> due to the current chunk zone mapping implementation as it is assumed
> that zones and chunks have the same size with all blocks usable.
> If a zoned drive is found to have zones with a capacity different from
> the zone size, fail the target initialization.
> 
> Signed-off-by: Damien Le Moal 
> ---
>  drivers/md/dm-zoned-metadata.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 039d17b28938..ee4626d08557 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -1390,6 +1390,13 @@ static int dmz_init_zone(struct blk_zone *blkz, 
> unsigned int num, void *data)
>   return -ENXIO;
>   }
>  
> + /*
> +  * Devices that have zones with a capacity smaller than the zone size
> +  * (e.g. NVMe zoned namespaces) are not supported.
> +  */
> + if (blkz->capacity != blkz->len)
> + return -ENXIO;
> +
>   switch (blkz->type) {
>   case BLK_ZONE_TYPE_CONVENTIONAL:
>   set_bit(DMZ_RND, >flags);
> 

Hi Mike,

I just realized that I forgot to add:

Cc: sta...@vger.kernel.org # v5.9+

to this patch.

Should I resend ? Or can you add this tag when applying the patch ?

Thanks !

-- 
Damien Le Moal
Western Digital Research



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v2] dm-kcopyd: avoid useless atomic operations

2021-05-26 Thread Damien Le Moal
On 2021/05/26 23:16, Mikulas Patocka wrote:
> 
> 
> On Tue, 25 May 2021, Damien Le Moal wrote:
> 
>> On 2021/05/26 4:50, Mikulas Patocka wrote:
>>> The functions set_bit and clear_bit are atomic. We don't need atomicity
>>> when making flags for dm-kcopyd. So, change them to direct manipulation of
>>> the flags.
>>>
>>> Signed-off-by: Mikulas Patocka 
>>>
>>> Index: linux-2.6/drivers/md/dm-kcopyd.c
>>> ===
>>> --- linux-2.6.orig/drivers/md/dm-kcopyd.c
>>> +++ linux-2.6/drivers/md/dm-kcopyd.c
>>> @@ -812,7 +812,7 @@ void dm_kcopyd_copy(struct dm_kcopyd_cli
>>> if (!test_bit(DM_KCOPYD_WRITE_SEQ, >flags)) {
>>> for (i = 0; i < job->num_dests; i++) {
>>> if (bdev_zoned_model(dests[i].bdev) == BLK_ZONED_HM) {
>>> -   set_bit(DM_KCOPYD_WRITE_SEQ, >flags);
>>> +   job->flags |= 1UL << DM_KCOPYD_WRITE_SEQ;
>>
>> How about using the BIT() macro ?
>>
>> job->flags |= BIT(DM_KCOPYD_WRITE_SEQ);
>>
>> But I know some do not like that macro :)
> 
> Yes - it is better to use it.
> I also changed flags from unsigned long to unsigned, to make the structure 
> smaller.
> 
> 
> From: Mikulas Patocka 
> 
> dm-kcopyd: avoid useless atomic operations
> 
> The functions set_bit and clear_bit are atomic. We don't need atomicity
> when making flags for dm-kcopyd. So, change them to direct manipulation of
> the flags.
> 
> Signed-off-by: Mikulas Patocka 
> 
> Index: linux-2.6/drivers/md/dm-kcopyd.c
> ===
> --- linux-2.6.orig/drivers/md/dm-kcopyd.c
> +++ linux-2.6/drivers/md/dm-kcopyd.c
> @@ -341,7 +341,7 @@ static void client_free_pages(struct dm_
>  struct kcopyd_job {
>   struct dm_kcopyd_client *kc;
>   struct list_head list;
> - unsigned long flags;
> + unsigned flags;

This triggers a checkpatch warning. "unsigned int" would be better.

>  
>   /*
>* Error state of the job.
> @@ -418,7 +418,7 @@ static struct kcopyd_job *pop_io_job(str
>* constraint and sequential writes that are at the right position.
>*/
>   list_for_each_entry(job, jobs, list) {
> - if (job->rw == READ || !test_bit(DM_KCOPYD_WRITE_SEQ, 
> >flags)) {
> + if (job->rw == READ || !(job->flags & 
> BIT(DM_KCOPYD_WRITE_SEQ))) {
>   list_del(>list);
>   return job;
>   }
> @@ -529,7 +529,7 @@ static void complete_io(unsigned long er
>   else
>   job->read_err = 1;
>  
> - if (!test_bit(DM_KCOPYD_IGNORE_ERROR, >flags)) {
> + if (!(job->flags & BIT(DM_KCOPYD_IGNORE_ERROR))) {
>   push(>complete_jobs, job);
>   wake(kc);
>   return;
> @@ -572,7 +572,7 @@ static int run_io_job(struct kcopyd_job
>* If we need to write sequentially and some reads or writes failed,
>* no point in continuing.
>*/
> - if (test_bit(DM_KCOPYD_WRITE_SEQ, >flags) &&
> + if (job->flags & BIT(DM_KCOPYD_WRITE_SEQ) &&
>   job->master_job->write_err) {
>   job->write_err = job->master_job->write_err;
>   return -EIO;
> @@ -716,7 +716,7 @@ static void segment_complete(int read_er
>* Only dispatch more work if there hasn't been an error.
>*/
>   if ((!job->read_err && !job->write_err) ||
> - test_bit(DM_KCOPYD_IGNORE_ERROR, >flags)) {
> + job->flags & BIT(DM_KCOPYD_IGNORE_ERROR)) {
>   /* get the next chunk of work */
>   progress = job->progress;
>   count = job->source.count - progress;
> @@ -809,10 +809,10 @@ void dm_kcopyd_copy(struct dm_kcopyd_cli
>* we need to write sequentially. If one of the destination is a
>* host-aware device, then leave it to the caller to choose what to do.
>*/
> - if (!test_bit(DM_KCOPYD_WRITE_SEQ, >flags)) {
> + if (!(job->flags & BIT(DM_KCOPYD_WRITE_SEQ))) {
>   for (i = 0; i < job->num_dests; i++) {
>   if (bdev_zoned_model(dests[i].bdev) == BLK_ZONED_HM) {
> - set_bit(DM_KCOPYD_WRITE_SEQ, >flags);
> + job->flags |= BIT(DM_KCOPYD_WRITE_SEQ);
>   break;
>

Re: [dm-devel] [PATCH] dm-kcopyd avoid spin_lock_irqsave from process context

2021-05-26 Thread Damien Le Moal
On 2021/05/26 23:18, Mikulas Patocka wrote:
> The functions "pop", "push_head", "do_work" can only be called from
> process context. Therefore, we can replace
> spin_lock_irqsave/spin_unlock_irqrestore with
> spin_lock_irq/spin_unlock_irq.
> 
> Signed-off-by: Mikulas Patocka 

Reviewed-by: Damien Le Moal 


-- 
Damien Le Moal
Western Digital Research



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH] dm-kcopyd: avoid useless atomic operations

2021-05-25 Thread Damien Le Moal
On 2021/05/26 4:50, Mikulas Patocka wrote:
> The functions set_bit and clear_bit are atomic. We don't need atomicity
> when making flags for dm-kcopyd. So, change them to direct manipulation of
> the flags.
> 
> Signed-off-by: Mikulas Patocka 
> 
> Index: linux-2.6/drivers/md/dm-kcopyd.c
> ===
> --- linux-2.6.orig/drivers/md/dm-kcopyd.c
> +++ linux-2.6/drivers/md/dm-kcopyd.c
> @@ -812,7 +812,7 @@ void dm_kcopyd_copy(struct dm_kcopyd_cli
>   if (!test_bit(DM_KCOPYD_WRITE_SEQ, >flags)) {
>   for (i = 0; i < job->num_dests; i++) {
>   if (bdev_zoned_model(dests[i].bdev) == BLK_ZONED_HM) {
> - set_bit(DM_KCOPYD_WRITE_SEQ, >flags);
> + job->flags |= 1UL << DM_KCOPYD_WRITE_SEQ;

How about using the BIT() macro ?

job->flags |= BIT(DM_KCOPYD_WRITE_SEQ);

But I know some do not like that macro :)


>   break;
>   }
>   }
> @@ -823,7 +823,7 @@ void dm_kcopyd_copy(struct dm_kcopyd_cli
>*/
>   if (test_bit(DM_KCOPYD_WRITE_SEQ, >flags) &&
>   test_bit(DM_KCOPYD_IGNORE_ERROR, >flags))
> - clear_bit(DM_KCOPYD_IGNORE_ERROR, >flags);
> + job->flags &= ~(1UL << DM_KCOPYD_IGNORE_ERROR);
>  
>   if (from) {
>   job->source = *from;
> Index: linux-2.6/drivers/md/dm-raid1.c
> ===
> --- linux-2.6.orig/drivers/md/dm-raid1.c
> +++ linux-2.6/drivers/md/dm-raid1.c
> @@ -364,7 +364,7 @@ static void recover(struct mirror_set *m
>  
>   /* hand to kcopyd */
>   if (!errors_handled(ms))
> - set_bit(DM_KCOPYD_IGNORE_ERROR, );
> + flags |= 1UL << DM_KCOPYD_IGNORE_ERROR;
>  
>   dm_kcopyd_copy(ms->kcopyd_client, , ms->nr_mirrors - 1, to,
>  flags, recovery_complete, reg);
> Index: linux-2.6/drivers/md/dm-zoned-reclaim.c
> ===
> --- linux-2.6.orig/drivers/md/dm-zoned-reclaim.c
> +++ linux-2.6/drivers/md/dm-zoned-reclaim.c
> @@ -134,7 +134,7 @@ static int dmz_reclaim_copy(struct dmz_r
>   dst_zone_block = dmz_start_block(zmd, dst_zone);
>  
>   if (dmz_is_seq(dst_zone))
> - set_bit(DM_KCOPYD_WRITE_SEQ, );
> + flags |= 1UL << DM_KCOPYD_WRITE_SEQ;
>  
>   while (block < end_block) {
>       if (src_zone->dev->flags & DMZ_BDEV_DYING)
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel

Either way, looks fine to me.

Reviewed-by: Damien Le Moal 


-- 
Damien Le Moal
Western Digital Research



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v5 06/11] dm: move zone related code to dm-zone.c

2021-05-25 Thread Damien Le Moal
Move core and table code used for zoned targets and conditionally
defined with #ifdef CONFIG_BLK_DEV_ZONED to the new file dm-zone.c.
This file is conditionally compiled depending on CONFIG_BLK_DEV_ZONED.
The small helper dm_set_zones_restrictions() is introduced to
initialize a mapped device request queue zone attributes in
dm_table_set_restrictions().

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Himanshu Madhani 
---
 drivers/md/Makefile   |   4 ++
 drivers/md/dm-table.c |  14 ++
 drivers/md/dm-zone.c  | 102 ++
 drivers/md/dm.c   |  78 
 drivers/md/dm.h   |  11 +
 5 files changed, 120 insertions(+), 89 deletions(-)
 create mode 100644 drivers/md/dm-zone.c

diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index ef7ddc27685c..a74aaf8b1445 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -92,6 +92,10 @@ ifeq ($(CONFIG_DM_UEVENT),y)
 dm-mod-objs+= dm-uevent.o
 endif
 
+ifeq ($(CONFIG_BLK_DEV_ZONED),y)
+dm-mod-objs+= dm-zone.o
+endif
+
 ifeq ($(CONFIG_DM_VERITY_FEC),y)
 dm-verity-objs += dm-verity-fec.o
 endif
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 21fd9cd4da32..dd9f648ab598 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -2064,17 +2064,9 @@ void dm_table_set_restrictions(struct dm_table *t, 
struct request_queue *q,
dm_table_any_dev_attr(t, device_is_not_random, NULL))
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
 
-   /*
-* For a zoned target, the number of zones should be updated for the
-* correct value to be exposed in sysfs queue/nr_zones. For a BIO based
-* target, this is all that is needed.
-*/
-#ifdef CONFIG_BLK_DEV_ZONED
-   if (blk_queue_is_zoned(q)) {
-   WARN_ON_ONCE(queue_is_mq(q));
-   q->nr_zones = blkdev_nr_zones(t->md->disk);
-   }
-#endif
+   /* For a zoned target, setup the zones related queue attributes */
+   if (blk_queue_is_zoned(q))
+   dm_set_zones_restrictions(t, q);
 
dm_update_keyslot_manager(q, t);
blk_queue_update_readahead(q);
diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
new file mode 100644
index ..3243c42b7951
--- /dev/null
+++ b/drivers/md/dm-zone.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#include 
+
+#include "dm-core.h"
+
+/*
+ * User facing dm device block device report zone operation. This calls the
+ * report_zones operation for each target of a device table. This operation is
+ * generally implemented by targets using dm_report_zones().
+ */
+int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
+   unsigned int nr_zones, report_zones_cb cb, void *data)
+{
+   struct mapped_device *md = disk->private_data;
+   struct dm_table *map;
+   int srcu_idx, ret;
+   struct dm_report_zones_args args = {
+   .next_sector = sector,
+   .orig_data = data,
+   .orig_cb = cb,
+   };
+
+   if (dm_suspended_md(md))
+   return -EAGAIN;
+
+   map = dm_get_live_table(md, _idx);
+   if (!map) {
+   ret = -EIO;
+   goto out;
+   }
+
+   do {
+   struct dm_target *tgt;
+
+   tgt = dm_table_find_target(map, args.next_sector);
+   if (WARN_ON_ONCE(!tgt->type->report_zones)) {
+   ret = -EIO;
+   goto out;
+   }
+
+   args.tgt = tgt;
+   ret = tgt->type->report_zones(tgt, ,
+ nr_zones - args.zone_idx);
+   if (ret < 0)
+   goto out;
+   } while (args.zone_idx < nr_zones &&
+args.next_sector < get_capacity(disk));
+
+   ret = args.zone_idx;
+out:
+   dm_put_live_table(md, srcu_idx);
+   return ret;
+}
+
+int dm_report_zones_cb(struct blk_zone *zone, unsigned int idx, void *data)
+{
+   struct dm_report_zones_args *args = data;
+   sector_t sector_diff = args->tgt->begin - args->start;
+
+   /*
+* Ignore zones beyond the target range.
+*/
+   if (zone->start >= args->start + args->tgt->len)
+   return 0;
+
+   /*
+* Remap the start sector and write pointer position of the zone
+* to match its position in the target range.
+*/
+   zone->start += sector_diff;
+   if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL) {
+   if (zone->cond == BLK_ZONE_COND_FULL)
+   zone->wp = zone->start + zone->len;
+

[dm-devel] [PATCH v5 00/11] dm: Improve zoned block device support

2021-05-25 Thread Damien Le Moal
This series improve device mapper support for zoned block devices and
of targets exposing a zoned device.

The first patch improve support for user requests to reset all zones of
the target device. With the fix, such operation behave similarly to
physical block devices implementation based on the single zone reset
command with the ALL bit set.

The following 2 patches are preparatory block layer patches.

Patch 4 and 5 are 2 small fixes to DM core zoned block device support.

Patch 6 reorganizes DM core code, moving conditionally defined zoned
block device code into the new dm-zone.c file. This avoids sprinkly DM
with zone related code defined under an #ifdef CONFIG_BLK_DEV_ZONED.

Patch 7 improves DM zone report helper functions for target drivers.

Patch 8 fixes a potential problem with BIO requeue on zoned target.

Finally, patch 9 to 11 implement zone append emulation using regular
writes for target drivers that cannot natively support this BIO type.
The only target currently needing this emulation is dm-crypt. With this
change, a zoned dm-crypt device behaves exactly like a regular zoned
block device, correctly executing user zone append BIOs.

This series passes the following tests:
1) zonefs tests on top of dm-crypt with a zoned nullblk device
2) zonefs tests on top of dm-crypt+dm-linear with an SMR HDD
3) btrfs fstests on top of dm-crypt with zoned nullblk devices.

Comments are as always welcome.

Changes from v4:
* Remove useless extra space in variable initialization in patch 1
* Shorten dm_accept_partial_bio() documentation comment in patch 4
* Added reviewed-by tags

Changes from v3:
* Fixed missing variable initialization in
  blkdev_zone_reset_all_emulated() in patch 1.
* Rebased on rc3
* Added reviewed-by tags

Changes from v2:
* Replace use of spinlock to protect the zone write pointer offset
  array in patch 11 with READ_ONCE/WRITE_ONCE as suggested by Hannes.
* Added reviewed-by tags

Changes from v1:
* Use Christoph proposed approach for patch 1 (split reset all
  processing into different functions)
* Changed helpers introduced in patch 2 to remove the request_queue
  argument
* Improve patch 3 commit message as suggested by Christoph (explaining
  that the flag is a special case that cannot use a REQ_XXX flag)
* Changed DMWARN() into DMDEBUG in patch 11 as suggested by Milan
* Added reviewed-by tags

Damien Le Moal (11):
  block: improve handling of all zones reset operation
  block: introduce bio zone helpers
  block: introduce BIO_ZONE_WRITE_LOCKED bio flag
  dm: Fix dm_accept_partial_bio()
  dm: cleanup device_area_is_invalid()
  dm: move zone related code to dm-zone.c
  dm: Introduce dm_report_zones()
  dm: Forbid requeue of writes to zones
  dm: rearrange core declarations
  dm: introduce zone append emulation
  dm crypt: Fix zoned block device support

 block/blk-zoned.c | 119 +--
 drivers/md/Makefile   |   4 +
 drivers/md/dm-core.h  |  65 
 drivers/md/dm-crypt.c |  31 +-
 drivers/md/dm-flakey.c|   7 +-
 drivers/md/dm-linear.c|   7 +-
 drivers/md/dm-table.c |  21 +-
 drivers/md/dm-zone.c  | 654 ++
 drivers/md/dm.c   | 201 +++
 drivers/md/dm.h   |  30 +-
 include/linux/blk_types.h |   1 +
 include/linux/blkdev.h|  12 +
 include/linux/device-mapper.h |   9 +-
 13 files changed, 954 insertions(+), 207 deletions(-)
 create mode 100644 drivers/md/dm-zone.c

-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v5 11/11] dm crypt: Fix zoned block device support

2021-05-25 Thread Damien Le Moal
Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
of the zone to be written instead of the actual sector location to
write. The write location is determined by the device and returned to
the host upon completion of the operation. This interface, while simple
and efficient for writing into sequential zones of a zoned block
device, is incompatible with the use of sector values to calculate a
cypher block IV. All data written in a zone end up using the same IV
values corresponding to the first sectors of the zone, but read
operation will specify any sector within the zone resulting in an IV
mismatch between encryption and decryption.

To solve this problem, report to DM core that zone append operations are
not supported. This result in the zone append operations being emulated
using regular write operations.

Reported-by: Shin'ichiro Kawasaki 
Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Himanshu Madhani 
---
 drivers/md/dm-crypt.c | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index f410ceee51d7..50f4cbd600d5 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3280,14 +3280,28 @@ static int crypt_ctr(struct dm_target *ti, unsigned int 
argc, char **argv)
}
cc->start = tmpll;
 
-   /*
-* For zoned block devices, we need to preserve the issuer write
-* ordering. To do so, disable write workqueues and force inline
-* encryption completion.
-*/
if (bdev_is_zoned(cc->dev->bdev)) {
+   /*
+* For zoned block devices, we need to preserve the issuer write
+* ordering. To do so, disable write workqueues and force inline
+* encryption completion.
+*/
set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, >flags);
set_bit(DM_CRYPT_WRITE_INLINE, >flags);
+
+   /*
+* All zone append writes to a zone of a zoned block device will
+* have the same BIO sector, the start of the zone. When the
+* cypher IV mode uses sector values, all data targeting a
+* zone will be encrypted using the first sector numbers of the
+* zone. This will not result in write errors but will
+* cause most reads to fail as reads will use the sector values
+* for the actual data locations, resulting in IV mismatch.
+* To avoid this problem, ask DM core to emulate zone append
+* operations with regular writes.
+*/
+   DMDEBUG("Zone append operations will be emulated");
+   ti->emulate_zone_append = true;
}
 
if (crypt_integrity_aead(cc) || cc->integrity_iv_size) {
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v5 04/11] dm: Fix dm_accept_partial_bio()

2021-05-25 Thread Damien Le Moal
Fix dm_accept_partial_bio() to actually check that zone management
commands are not passed as explained in the function documentation
comment. Also, since a zone append operation cannot be split, add
REQ_OP_ZONE_APPEND as a forbidden command.

White lines are added around the group of BUG_ON() calls to make the
code more legible.

Signed-off-by: Damien Le Moal 
---
 drivers/md/dm.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ca2aedd8ee7d..11af20080639 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1237,8 +1237,8 @@ static int dm_dax_zero_page_range(struct dax_device 
*dax_dev, pgoff_t pgoff,
 
 /*
  * A target may call dm_accept_partial_bio only from the map routine.  It is
- * allowed for all bio types except REQ_PREFLUSH, REQ_OP_ZONE_RESET,
- * REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH.
+ * allowed for all bio types except REQ_PREFLUSH, REQ_OP_ZONE_* zone management
+ * operations and REQ_OP_ZONE_APPEND (zone append writes).
  *
  * dm_accept_partial_bio informs the dm that the target only wants to process
  * additional n_sectors sectors of the bio and the rest of the data should be
@@ -1268,9 +1268,13 @@ void dm_accept_partial_bio(struct bio *bio, unsigned 
n_sectors)
 {
struct dm_target_io *tio = container_of(bio, struct dm_target_io, 
clone);
unsigned bi_size = bio->bi_iter.bi_size >> SECTOR_SHIFT;
+
BUG_ON(bio->bi_opf & REQ_PREFLUSH);
+   BUG_ON(op_is_zone_mgmt(bio_op(bio)));
+   BUG_ON(bio_op(bio) == REQ_OP_ZONE_APPEND);
BUG_ON(bi_size > *tio->len_ptr);
BUG_ON(n_sectors > bi_size);
+
*tio->len_ptr -= bi_size - n_sectors;
bio->bi_iter.bi_size = n_sectors << SECTOR_SHIFT;
 }
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v5 10/11] dm: introduce zone append emulation

2021-05-25 Thread Damien Le Moal
For zoned targets that cannot support zone append operations, implement
an emulation using regular write operations. If the original BIO
submitted by the user is a zone append operation, change its clone into
a regular write operation directed at the target zone write pointer
position.

To do so, an array of write pointer offsets (write pointer position
relative to the start of a zone) is added to struct mapped_device. All
operations that modify a sequential zone write pointer (writes, zone
reset, zone finish and zone append) are intersepted in __map_bio() and
processed using the new functions dm_zone_map_bio().

Detection of the target ability to natively support zone append
operations is done from dm_table_set_restrictions() by calling the
function dm_set_zones_restrictions(). A target that does not support
zone append operation, either by explicitly declaring it using the new
struct dm_target field zone_append_not_supported, or because the device
table contains a non-zoned device, has its mapped device marked with the
new flag DMF_ZONE_APPEND_EMULATED. The helper function
dm_emulate_zone_append() is introduced to test a mapped device for this
new flag.

Atomicity of the zones write pointer tracking and updates is done using
a zone write locking mechanism based on a bitmap. This is similar to
the block layer method but based on BIOs rather than struct request.
A zone write lock is taken in dm_zone_map_bio() for any clone BIO with
an operation type that changes the BIO target zone write pointer
position. The zone write lock is released if the clone BIO is failed
before submission or when dm_zone_endio() is called when the clone BIO
completes.

The zone write lock bitmap of the mapped device, together with a bitmap
indicating zone types (conv_zones_bitmap) and the write pointer offset
array (zwp_offset) are allocated and initialized with a full device zone
report in dm_set_zones_restrictions() using the function
dm_revalidate_zones().

For failed operations that may have modified a zone write pointer, the
zone write pointer offset is marked as invalid in dm_zone_endio().
Zones with an invalid write pointer offset are checked and the write
pointer updated using an internal report zone operation when the
faulty zone is accessed again by the user.

All functions added for this emulation have a minimal overhead for
zoned targets natively supporting zone append operations. Regular
device targets are also not affected. The added code also does not
impact builds with CONFIG_BLK_DEV_ZONED disabled by stubbing out all
dm zone related functions.

Signed-off-by: Damien Le Moal 
Reviewed-by: Himanshu Madhani 
---
 drivers/md/dm-core.h  |  13 +
 drivers/md/dm-table.c |  19 +-
 drivers/md/dm-zone.c  | 580 --
 drivers/md/dm.c   |  38 ++-
 drivers/md/dm.h   |  16 +-
 include/linux/device-mapper.h |   6 +
 6 files changed, 618 insertions(+), 54 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index cfabc1c91f9f..edc1553c4eea 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -114,6 +114,11 @@ struct mapped_device {
bool init_tio_pdu:1;
 
struct srcu_struct io_barrier;
+
+#ifdef CONFIG_BLK_DEV_ZONED
+   unsigned int nr_zones;
+   unsigned int *zwp_offset;
+#endif
 };
 
 /*
@@ -128,6 +133,7 @@ struct mapped_device {
 #define DMF_DEFERRED_REMOVE 6
 #define DMF_SUSPENDED_INTERNALLY 7
 #define DMF_POST_SUSPENDING 8
+#define DMF_EMULATE_ZONE_APPEND 9
 
 void disable_discard(struct mapped_device *md);
 void disable_write_same(struct mapped_device *md);
@@ -143,6 +149,13 @@ static inline struct dm_stats *dm_get_stats(struct 
mapped_device *md)
return >stats;
 }
 
+static inline bool dm_emulate_zone_append(struct mapped_device *md)
+{
+   if (blk_queue_is_zoned(md->queue))
+   return test_bit(DMF_EMULATE_ZONE_APPEND, >flags);
+   return false;
+}
+
 #define DM_TABLE_MAX_DEPTH 16
 
 struct dm_table {
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index dd9f648ab598..21fdccfb16cf 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1981,11 +1981,12 @@ static int device_requires_stable_pages(struct 
dm_target *ti,
return blk_queue_stable_writes(q);
 }
 
-void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
-  struct queue_limits *limits)
+int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
+ struct queue_limits *limits)
 {
bool wc = false, fua = false;
int page_size = PAGE_SIZE;
+   int r;
 
/*
 * Copy table's limits to the DM device's request_queue
@@ -2064,12 +2065,20 @@ void dm_table_set_restrictions(struct dm_table *t, 
struct request_queue *q,
dm_table_any_dev_attr(t, device_is_not_random, NULL))
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
 
-   /* Fo

[dm-devel] [PATCH v5 05/11] dm: cleanup device_area_is_invalid()

2021-05-25 Thread Damien Le Moal
In device_area_is_invalid(), use bdev_is_zoned() instead of open
coding the test on the zoned model returned by bdev_zoned_model().

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Himanshu Madhani 
---
 drivers/md/dm-table.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index ee47a332b462..21fd9cd4da32 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -249,7 +249,7 @@ static int device_area_is_invalid(struct dm_target *ti, 
struct dm_dev *dev,
 * If the target is mapped to zoned block device(s), check
 * that the zones are not partially mapped.
 */
-   if (bdev_zoned_model(bdev) != BLK_ZONED_NONE) {
+   if (bdev_is_zoned(bdev)) {
unsigned int zone_sectors = bdev_zone_sectors(bdev);
 
if (start & (zone_sectors - 1)) {
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v5 09/11] dm: rearrange core declarations

2021-05-25 Thread Damien Le Moal
Move the definitions of struct dm_target_io, struct dm_io and of the
bits of the flags field of struct mapped_device from dm.c to dm-core.h
to make them usable from dm-zone.c.
For the same reason, declare dec_pending() in dm-core.h after renaming
it to dm_io_dec_pending(). And for symmetry of the function names,
introduce the inline helper dm_io_inc_pending() instead of directly
using atomic_inc() calls.

Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Himanshu Madhani 
---
 drivers/md/dm-core.h | 52 ++
 drivers/md/dm.c  | 59 ++--
 2 files changed, 59 insertions(+), 52 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 5953ff2bd260..cfabc1c91f9f 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -116,6 +116,19 @@ struct mapped_device {
struct srcu_struct io_barrier;
 };
 
+/*
+ * Bits for the flags field of struct mapped_device.
+ */
+#define DMF_BLOCK_IO_FOR_SUSPEND 0
+#define DMF_SUSPENDED 1
+#define DMF_FROZEN 2
+#define DMF_FREEING 3
+#define DMF_DELETING 4
+#define DMF_NOFLUSH_SUSPENDING 5
+#define DMF_DEFERRED_REMOVE 6
+#define DMF_SUSPENDED_INTERNALLY 7
+#define DMF_POST_SUSPENDING 8
+
 void disable_discard(struct mapped_device *md);
 void disable_write_same(struct mapped_device *md);
 void disable_write_zeroes(struct mapped_device *md);
@@ -173,6 +186,45 @@ struct dm_table {
 #endif
 };
 
+/*
+ * One of these is allocated per clone bio.
+ */
+#define DM_TIO_MAGIC 7282014
+struct dm_target_io {
+   unsigned int magic;
+   struct dm_io *io;
+   struct dm_target *ti;
+   unsigned int target_bio_nr;
+   unsigned int *len_ptr;
+   bool inside_dm_io;
+   struct bio clone;
+};
+
+/*
+ * One of these is allocated per original bio.
+ * It contains the first clone used for that original.
+ */
+#define DM_IO_MAGIC 5191977
+struct dm_io {
+   unsigned int magic;
+   struct mapped_device *md;
+   blk_status_t status;
+   atomic_t io_count;
+   struct bio *orig_bio;
+   unsigned long start_time;
+   spinlock_t endio_lock;
+   struct dm_stats_aux stats_aux;
+   /* last member of dm_target_io is 'struct bio' */
+   struct dm_target_io tio;
+};
+
+static inline void dm_io_inc_pending(struct dm_io *io)
+{
+   atomic_inc(>io_count);
+}
+
+void dm_io_dec_pending(struct dm_io *io, blk_status_t error);
+
 static inline struct completion *dm_get_completion_from_kobject(struct kobject 
*kobj)
 {
return _of(kobj, struct dm_kobject_holder, kobj)->completion;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ed8c5a8df2e5..c200658d8bcb 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -74,38 +74,6 @@ struct clone_info {
unsigned sector_count;
 };
 
-/*
- * One of these is allocated per clone bio.
- */
-#define DM_TIO_MAGIC 7282014
-struct dm_target_io {
-   unsigned magic;
-   struct dm_io *io;
-   struct dm_target *ti;
-   unsigned target_bio_nr;
-   unsigned *len_ptr;
-   bool inside_dm_io;
-   struct bio clone;
-};
-
-/*
- * One of these is allocated per original bio.
- * It contains the first clone used for that original.
- */
-#define DM_IO_MAGIC 5191977
-struct dm_io {
-   unsigned magic;
-   struct mapped_device *md;
-   blk_status_t status;
-   atomic_t io_count;
-   struct bio *orig_bio;
-   unsigned long start_time;
-   spinlock_t endio_lock;
-   struct dm_stats_aux stats_aux;
-   /* last member of dm_target_io is 'struct bio' */
-   struct dm_target_io tio;
-};
-
 #define DM_TARGET_IO_BIO_OFFSET (offsetof(struct dm_target_io, clone))
 #define DM_IO_BIO_OFFSET \
(offsetof(struct dm_target_io, clone) + offsetof(struct dm_io, tio))
@@ -137,19 +105,6 @@ EXPORT_SYMBOL_GPL(dm_bio_get_target_bio_nr);
 
 #define MINOR_ALLOCED ((void *)-1)
 
-/*
- * Bits for the md->flags field.
- */
-#define DMF_BLOCK_IO_FOR_SUSPEND 0
-#define DMF_SUSPENDED 1
-#define DMF_FROZEN 2
-#define DMF_FREEING 3
-#define DMF_DELETING 4
-#define DMF_NOFLUSH_SUSPENDING 5
-#define DMF_DEFERRED_REMOVE 6
-#define DMF_SUSPENDED_INTERNALLY 7
-#define DMF_POST_SUSPENDING 8
-
 #define DM_NUMA_NODE NUMA_NO_NODE
 static int dm_numa_node = DM_NUMA_NODE;
 
@@ -825,7 +780,7 @@ static int __noflush_suspending(struct mapped_device *md)
  * Decrements the number of outstanding ios that a bio has been
  * cloned into, completing the original io if necc.
  */
-static void dec_pending(struct dm_io *io, blk_status_t error)
+void dm_io_dec_pending(struct dm_io *io, blk_status_t error)
 {
unsigned long flags;
blk_status_t io_error;
@@ -978,7 +933,7 @@ static void clone_endio(struct bio *bio)
}
 
free_tio(tio);
-   dec_pending(io, error);
+   dm_io_dec_pending(io, error);
 }
 
 /*
@@ -1246,7 +1201,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
 * anything, the target has assumed 

[dm-devel] [PATCH v5 03/11] block: introduce BIO_ZONE_WRITE_LOCKED bio flag

2021-05-25 Thread Damien Le Moal
Introduce the BIO flag BIO_ZONE_WRITE_LOCKED to indicate that a BIO owns
the write lock of the zone it is targeting. This is the counterpart of
the struct request flag RQF_ZONE_WRITE_LOCKED.

This new BIO flag is reserved for now for zone write locking control
for device mapper targets exposing a zoned block device. Since in this
case, the lock flag must not be propagated to the struct request that
will be used to process the BIO, a BIO private flag is used rather than
changing the RQF_ZONE_WRITE_LOCKED request flag into a common REQ_XXX
flag that could be used for both BIO and request. This avoids conflicts
down the stack with the block IO scheduler zone write locking
(in mq-deadline).

Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Chaitanya Kulkarni 
Reviewed-by: Himanshu Madhani 
---
 include/linux/blk_types.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index db026b6ec15a..e5cf12f102a2 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -304,6 +304,7 @@ enum {
BIO_CGROUP_ACCT,/* has been accounted to a cgroup */
BIO_TRACKED,/* set if bio goes through the rq_qos path */
BIO_REMAPPED,
+   BIO_ZONE_WRITE_LOCKED,  /* Owns a zoned device zone write lock */
BIO_FLAG_LAST
 };
 
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v5 08/11] dm: Forbid requeue of writes to zones

2021-05-25 Thread Damien Le Moal
A target map method requesting the requeue of a bio with
DM_MAPIO_REQUEUE or completing it with DM_ENDIO_REQUEUE can cause
unaligned write errors if the bio is a write operation targeting a
sequential zone. If a zoned target request such a requeue, warn about
it and kill the IO.

The function dm_is_zone_write() is introduced to detect write operations
to zoned targets.

This change does not affect the target drivers supporting zoned devices
and exposing a zoned device, namely dm-crypt, dm-linear and dm-flakey as
none of these targets ever request a requeue.

Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Himanshu Madhani 
---
 drivers/md/dm-zone.c | 17 +
 drivers/md/dm.c  | 18 +++---
 drivers/md/dm.h  |  5 +
 3 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
index b42474043249..edc3bbb45637 100644
--- a/drivers/md/dm-zone.c
+++ b/drivers/md/dm-zone.c
@@ -104,6 +104,23 @@ int dm_report_zones(struct block_device *bdev, sector_t 
start, sector_t sector,
 }
 EXPORT_SYMBOL_GPL(dm_report_zones);
 
+bool dm_is_zone_write(struct mapped_device *md, struct bio *bio)
+{
+   struct request_queue *q = md->queue;
+
+   if (!blk_queue_is_zoned(q))
+   return false;
+
+   switch (bio_op(bio)) {
+   case REQ_OP_WRITE_ZEROES:
+   case REQ_OP_WRITE_SAME:
+   case REQ_OP_WRITE:
+   return !op_is_flush(bio->bi_opf) && bio_sectors(bio);
+   default:
+   return false;
+   }
+}
+
 void dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q)
 {
if (!blk_queue_is_zoned(q))
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index c49976cc4e44..ed8c5a8df2e5 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -846,11 +846,15 @@ static void dec_pending(struct dm_io *io, blk_status_t 
error)
 * Target requested pushing back the I/O.
 */
spin_lock_irqsave(>deferred_lock, flags);
-   if (__noflush_suspending(md))
+   if (__noflush_suspending(md) &&
+   !WARN_ON_ONCE(dm_is_zone_write(md, bio)))
/* NOTE early return due to BLK_STS_DM_REQUEUE 
below */
bio_list_add_head(>deferred, io->orig_bio);
else
-   /* noflush suspend was interrupted. */
+   /*
+* noflush suspend was interrupted or this is
+* a write to a zoned target.
+*/
io->status = BLK_STS_IOERR;
spin_unlock_irqrestore(>deferred_lock, flags);
}
@@ -947,7 +951,15 @@ static void clone_endio(struct bio *bio)
int r = endio(tio->ti, bio, );
switch (r) {
case DM_ENDIO_REQUEUE:
-   error = BLK_STS_DM_REQUEUE;
+   /*
+* Requeuing writes to a sequential zone of a zoned
+* target will break the sequential write pattern:
+* fail such IO.
+*/
+   if (WARN_ON_ONCE(dm_is_zone_write(md, bio)))
+   error = BLK_STS_IOERR;
+   else
+   error = BLK_STS_DM_REQUEUE;
fallthrough;
case DM_ENDIO_DONE:
break;
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index fdf1536a4b62..39c243258e24 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -107,8 +107,13 @@ void dm_set_zones_restrictions(struct dm_table *t, struct 
request_queue *q);
 #ifdef CONFIG_BLK_DEV_ZONED
 int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
unsigned int nr_zones, report_zones_cb cb, void *data);
+bool dm_is_zone_write(struct mapped_device *md, struct bio *bio);
 #else
 #define dm_blk_report_zonesNULL
+static inline bool dm_is_zone_write(struct mapped_device *md, struct bio *bio)
+{
+   return false;
+}
 #endif
 
 /*-
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v5 07/11] dm: Introduce dm_report_zones()

2021-05-25 Thread Damien Le Moal
To simplify the implementation of the report_zones operation of a zoned
target, introduce the function dm_report_zones() to set a target
mapping start sector in struct dm_report_zones_args and call
blkdev_report_zones(). This new function is exported and the report
zones callback function dm_report_zones_cb() is not.

dm-linear, dm-flakey and dm-crypt are modified to use dm_report_zones().

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Himanshu Madhani 
---
 drivers/md/dm-crypt.c |  7 +++
 drivers/md/dm-flakey.c|  7 +++
 drivers/md/dm-linear.c|  7 +++
 drivers/md/dm-zone.c  | 23 ---
 include/linux/device-mapper.h |  3 ++-
 5 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index b0ab080f2567..f410ceee51d7 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3138,11 +3138,10 @@ static int crypt_report_zones(struct dm_target *ti,
struct dm_report_zones_args *args, unsigned int nr_zones)
 {
struct crypt_config *cc = ti->private;
-   sector_t sector = cc->start + dm_target_offset(ti, args->next_sector);
 
-   args->start = cc->start;
-   return blkdev_report_zones(cc->dev->bdev, sector, nr_zones,
-  dm_report_zones_cb, args);
+   return dm_report_zones(cc->dev->bdev, cc->start,
+   cc->start + dm_target_offset(ti, args->next_sector),
+   args, nr_zones);
 }
 #else
 #define crypt_report_zones NULL
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index b7fee9936f05..5877220c01ed 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -463,11 +463,10 @@ static int flakey_report_zones(struct dm_target *ti,
struct dm_report_zones_args *args, unsigned int nr_zones)
 {
struct flakey_c *fc = ti->private;
-   sector_t sector = flakey_map_sector(ti, args->next_sector);
 
-   args->start = fc->start;
-   return blkdev_report_zones(fc->dev->bdev, sector, nr_zones,
-  dm_report_zones_cb, args);
+   return dm_report_zones(fc->dev->bdev, fc->start,
+  flakey_map_sector(ti, args->next_sector),
+  args, nr_zones);
 }
 #else
 #define flakey_report_zones NULL
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 92db0f5e7f28..c91f1e2e2f65 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -140,11 +140,10 @@ static int linear_report_zones(struct dm_target *ti,
struct dm_report_zones_args *args, unsigned int nr_zones)
 {
struct linear_c *lc = ti->private;
-   sector_t sector = linear_map_sector(ti, args->next_sector);
 
-   args->start = lc->start;
-   return blkdev_report_zones(lc->dev->bdev, sector, nr_zones,
-  dm_report_zones_cb, args);
+   return dm_report_zones(lc->dev->bdev, lc->start,
+  linear_map_sector(ti, args->next_sector),
+  args, nr_zones);
 }
 #else
 #define linear_report_zones NULL
diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
index 3243c42b7951..b42474043249 100644
--- a/drivers/md/dm-zone.c
+++ b/drivers/md/dm-zone.c
@@ -56,7 +56,8 @@ int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
return ret;
 }
 
-int dm_report_zones_cb(struct blk_zone *zone, unsigned int idx, void *data)
+static int dm_report_zones_cb(struct blk_zone *zone, unsigned int idx,
+ void *data)
 {
struct dm_report_zones_args *args = data;
sector_t sector_diff = args->tgt->begin - args->start;
@@ -84,7 +85,24 @@ int dm_report_zones_cb(struct blk_zone *zone, unsigned int 
idx, void *data)
args->next_sector = zone->start + zone->len;
return args->orig_cb(zone, args->zone_idx++, args->orig_data);
 }
-EXPORT_SYMBOL_GPL(dm_report_zones_cb);
+
+/*
+ * Helper for drivers of zoned targets to implement struct target_type
+ * report_zones operation.
+ */
+int dm_report_zones(struct block_device *bdev, sector_t start, sector_t sector,
+   struct dm_report_zones_args *args, unsigned int nr_zones)
+{
+   /*
+* Set the target mapping start sector first so that
+* dm_report_zones_cb() can correctly remap zone information.
+*/
+   args->start = start;
+
+   return blkdev_report_zones(bdev, sector, nr_zones,
+  dm_report_zones_cb, args);
+}
+EXPORT_SYMBOL_GPL(dm_report_zones);
 
 void dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q)
 {
@@ -99,4 +117,3 @@ void dm_set_zones_restrictions(struct dm_table *t, struct 
request_queue *q)
  

[dm-devel] [PATCH v5 01/11] block: improve handling of all zones reset operation

2021-05-25 Thread Damien Le Moal
SCSI, ZNS and null_blk zoned devices support resetting all zones using
a single command (REQ_OP_ZONE_RESET_ALL), as indicated using the device
request queue flag QUEUE_FLAG_ZONE_RESETALL. This flag is not set for
device mapper targets creating zoned devices. In this case, a user
request for resetting all zones of a device is processed in
blkdev_zone_mgmt() by issuing a REQ_OP_ZONE_RESET operation for each
zone of the device. This leads to different behaviors of the
BLKRESETZONE ioctl() depending on the target device support for the
reset all operation. E.g.

blkzone reset /dev/sdX

will reset all zones of a SCSI device using a single command that will
ignore conventional, read-only or offline zones.

But a dm-linear device including conventional, read-only or offline
zones cannot be reset in the same manner as some of the single zone
reset operations issued by blkdev_zone_mgmt() will fail. E.g.:

blkzone reset /dev/dm-Y
blkzone: /dev/dm-0: BLKRESETZONE ioctl failed: Remote I/O error

To simplify applications and tools development, unify the behavior of
the all-zone reset operation by modifying blkdev_zone_mgmt() to not
issue a zone reset operation for conventional, read-only and offline
zones, thus mimicking what an actual reset-all device command does on a
device supporting REQ_OP_ZONE_RESET_ALL. This emulation is done using
the new function blkdev_zone_reset_all_emulated(). The zones needing a
reset are identified using a bitmap that is initialized using a zone
report. Since empty zones do not need a reset, also ignore these zones.
The function blkdev_zone_reset_all() is introduced for block devices
natively supporting reset all operations. blkdev_zone_mgmt() is modified
to call either function to execute an all zone reset request.

Signed-off-by: Damien Le Moal 
[hch: split into multiple functions]
Signed-off-by: Christoph Hellwig 
Reviewed-by: Chaitanya Kulkarni 
---
 block/blk-zoned.c | 119 +++---
 1 file changed, 92 insertions(+), 27 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 250cb76ee615..86fce751bb17 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -161,18 +161,89 @@ int blkdev_report_zones(struct block_device *bdev, 
sector_t sector,
 }
 EXPORT_SYMBOL_GPL(blkdev_report_zones);
 
-static inline bool blkdev_allow_reset_all_zones(struct block_device *bdev,
-   sector_t sector,
-   sector_t nr_sectors)
+static inline unsigned long *blk_alloc_zone_bitmap(int node,
+  unsigned int nr_zones)
 {
-   if (!blk_queue_zone_resetall(bdev_get_queue(bdev)))
-   return false;
+   return kcalloc_node(BITS_TO_LONGS(nr_zones), sizeof(unsigned long),
+   GFP_NOIO, node);
+}
 
+static int blk_zone_need_reset_cb(struct blk_zone *zone, unsigned int idx,
+ void *data)
+{
/*
-* REQ_OP_ZONE_RESET_ALL can be executed only if the number of sectors
-* of the applicable zone range is the entire disk.
+* For an all-zones reset, ignore conventional, empty, read-only
+* and offline zones.
 */
-   return !sector && nr_sectors == get_capacity(bdev->bd_disk);
+   switch (zone->cond) {
+   case BLK_ZONE_COND_NOT_WP:
+   case BLK_ZONE_COND_EMPTY:
+   case BLK_ZONE_COND_READONLY:
+   case BLK_ZONE_COND_OFFLINE:
+   return 0;
+   default:
+   set_bit(idx, (unsigned long *)data);
+   return 0;
+   }
+}
+
+static int blkdev_zone_reset_all_emulated(struct block_device *bdev,
+ gfp_t gfp_mask)
+{
+   struct request_queue *q = bdev_get_queue(bdev);
+   sector_t capacity = get_capacity(bdev->bd_disk);
+   sector_t zone_sectors = blk_queue_zone_sectors(q);
+   unsigned long *need_reset;
+   struct bio *bio = NULL;
+   sector_t sector = 0;
+   int ret;
+
+   need_reset = blk_alloc_zone_bitmap(q->node, q->nr_zones);
+   if (!need_reset)
+   return -ENOMEM;
+
+   ret = bdev->bd_disk->fops->report_zones(bdev->bd_disk, 0,
+   q->nr_zones, blk_zone_need_reset_cb,
+   need_reset);
+   if (ret < 0)
+   goto out_free_need_reset;
+
+   ret = 0;
+   while (sector < capacity) {
+   if (!test_bit(blk_queue_zone_no(q, sector), need_reset)) {
+   sector += zone_sectors;
+   continue;
+   }
+
+   bio = blk_next_bio(bio, 0, gfp_mask);
+   bio_set_dev(bio, bdev);
+   bio->bi_opf = REQ_OP_ZONE_RESET | REQ_SYNC;
+   bio->bi_iter.bi_sector = sector;
+   sector += zone_sectors;
+
+   /* This may take a while, so

[dm-devel] [PATCH v5 02/11] block: introduce bio zone helpers

2021-05-25 Thread Damien Le Moal
Introduce the helper functions bio_zone_no() and bio_zone_is_seq().
Both are the BIO counterparts of the request helpers blk_rq_zone_no()
and blk_rq_zone_is_seq(), respectively returning the number of the
target zone of a bio and true if the BIO target zone is sequential.

Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Chaitanya Kulkarni 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Himanshu Madhani 
---
 include/linux/blkdev.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f69c75bd6d27..2db0f376f5d9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1008,6 +1008,18 @@ static inline unsigned int blk_rq_stats_sectors(const 
struct request *rq)
 /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */
 const char *blk_zone_cond_str(enum blk_zone_cond zone_cond);
 
+static inline unsigned int bio_zone_no(struct bio *bio)
+{
+   return blk_queue_zone_no(bdev_get_queue(bio->bi_bdev),
+bio->bi_iter.bi_sector);
+}
+
+static inline unsigned int bio_zone_is_seq(struct bio *bio)
+{
+   return blk_queue_zone_is_seq(bdev_get_queue(bio->bi_bdev),
+bio->bi_iter.bi_sector);
+}
+
 static inline unsigned int blk_rq_zone_no(struct request *rq)
 {
return blk_queue_zone_no(rq->q, blk_rq_pos(rq));
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v4 01/11] block: improve handling of all zones reset operation

2021-05-25 Thread Damien Le Moal
On 2021/05/25 15:15, Chaitanya Kulkarni wrote:
> On 5/24/21 7:25 PM, Damien Le Moal wrote:
>> SCSI, ZNS and null_blk zoned devices support resetting all zones using
>> a single command (REQ_OP_ZONE_RESET_ALL), as indicated using the device
>> request queue flag QUEUE_FLAG_ZONE_RESETALL. This flag is not set for
>> device mapper targets creating zoned devices. In this case, a user
>> request for resetting all zones of a device is processed in
>> blkdev_zone_mgmt() by issuing a REQ_OP_ZONE_RESET operation for each
>> zone of the device. This leads to different behaviors of the
>> BLKRESETZONE ioctl() depending on the target device support for the
>> reset all operation. E.g.
>>
>> blkzone reset /dev/sdX
>>
>> will reset all zones of a SCSI device using a single command that will
>> ignore conventional, read-only or offline zones.
>>
>> But a dm-linear device including conventional, read-only or offline
>> zones cannot be reset in the same manner as some of the single zone
>> reset operations issued by blkdev_zone_mgmt() will fail. E.g.:
>>
>> blkzone reset /dev/dm-Y
>> blkzone: /dev/dm-0: BLKRESETZONE ioctl failed: Remote I/O error
>>
>> To simplify applications and tools development, unify the behavior of
>> the all-zone reset operation by modifying blkdev_zone_mgmt() to not
>> issue a zone reset operation for conventional, read-only and offline
>> zones, thus mimicking what an actual reset-all device command does on a
>> device supporting REQ_OP_ZONE_RESET_ALL. This emulation is done using
>> the new function blkdev_zone_reset_all_emulated(). The zones needing a
>> reset are identified using a bitmap that is initialized using a zone
>> report. Since empty zones do not need a reset, also ignore these zones.
>> The function blkdev_zone_reset_all() is introduced for block devices
>> natively supporting reset all operations. blkdev_zone_mgmt() is modified
>> to call either function to execute an all zone reset request.
>>
>> Signed-off-by: Damien Le Moal 
>> [hch: split into multiple functions]
>> Signed-off-by: Christoph Hellwig 
> 
> Apart from nit mentioned earlier, looks good.
> 
> Reviewed-by: Chaitanya Kulkarni 

Thanks. Will send v5 with a correction of the extra space.

> 
> 
> 


-- 
Damien Le Moal
Western Digital Research



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v4 03/11] block: introduce BIO_ZONE_WRITE_LOCKED bio flag

2021-05-24 Thread Damien Le Moal
Introduce the BIO flag BIO_ZONE_WRITE_LOCKED to indicate that a BIO owns
the write lock of the zone it is targeting. This is the counterpart of
the struct request flag RQF_ZONE_WRITE_LOCKED.

This new BIO flag is reserved for now for zone write locking control
for device mapper targets exposing a zoned block device. Since in this
case, the lock flag must not be propagated to the struct request that
will be used to process the BIO, a BIO private flag is used rather than
changing the RQF_ZONE_WRITE_LOCKED request flag into a common REQ_XXX
flag that could be used for both BIO and request. This avoids conflicts
down the stack with the block IO scheduler zone write locking
(in mq-deadline).

Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Chaitanya Kulkarni 
---
 include/linux/blk_types.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index db026b6ec15a..e5cf12f102a2 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -304,6 +304,7 @@ enum {
BIO_CGROUP_ACCT,/* has been accounted to a cgroup */
BIO_TRACKED,/* set if bio goes through the rq_qos path */
BIO_REMAPPED,
+   BIO_ZONE_WRITE_LOCKED,  /* Owns a zoned device zone write lock */
BIO_FLAG_LAST
 };
 
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v4 05/11] dm: cleanup device_area_is_invalid()

2021-05-24 Thread Damien Le Moal
In device_area_is_invalid(), use bdev_is_zoned() instead of open
coding the test on the zoned model returned by bdev_zoned_model().

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/dm-table.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index ee47a332b462..21fd9cd4da32 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -249,7 +249,7 @@ static int device_area_is_invalid(struct dm_target *ti, 
struct dm_dev *dev,
 * If the target is mapped to zoned block device(s), check
 * that the zones are not partially mapped.
 */
-   if (bdev_zoned_model(bdev) != BLK_ZONED_NONE) {
+   if (bdev_is_zoned(bdev)) {
unsigned int zone_sectors = bdev_zone_sectors(bdev);
 
if (start & (zone_sectors - 1)) {
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v4 07/11] dm: Introduce dm_report_zones()

2021-05-24 Thread Damien Le Moal
To simplify the implementation of the report_zones operation of a zoned
target, introduce the function dm_report_zones() to set a target
mapping start sector in struct dm_report_zones_args and call
blkdev_report_zones(). This new function is exported and the report
zones callback function dm_report_zones_cb() is not.

dm-linear, dm-flakey and dm-crypt are modified to use dm_report_zones().

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/dm-crypt.c |  7 +++
 drivers/md/dm-flakey.c|  7 +++
 drivers/md/dm-linear.c|  7 +++
 drivers/md/dm-zone.c  | 23 ---
 include/linux/device-mapper.h |  3 ++-
 5 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index b0ab080f2567..f410ceee51d7 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3138,11 +3138,10 @@ static int crypt_report_zones(struct dm_target *ti,
struct dm_report_zones_args *args, unsigned int nr_zones)
 {
struct crypt_config *cc = ti->private;
-   sector_t sector = cc->start + dm_target_offset(ti, args->next_sector);
 
-   args->start = cc->start;
-   return blkdev_report_zones(cc->dev->bdev, sector, nr_zones,
-  dm_report_zones_cb, args);
+   return dm_report_zones(cc->dev->bdev, cc->start,
+   cc->start + dm_target_offset(ti, args->next_sector),
+   args, nr_zones);
 }
 #else
 #define crypt_report_zones NULL
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index b7fee9936f05..5877220c01ed 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -463,11 +463,10 @@ static int flakey_report_zones(struct dm_target *ti,
struct dm_report_zones_args *args, unsigned int nr_zones)
 {
struct flakey_c *fc = ti->private;
-   sector_t sector = flakey_map_sector(ti, args->next_sector);
 
-   args->start = fc->start;
-   return blkdev_report_zones(fc->dev->bdev, sector, nr_zones,
-  dm_report_zones_cb, args);
+   return dm_report_zones(fc->dev->bdev, fc->start,
+  flakey_map_sector(ti, args->next_sector),
+  args, nr_zones);
 }
 #else
 #define flakey_report_zones NULL
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 92db0f5e7f28..c91f1e2e2f65 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -140,11 +140,10 @@ static int linear_report_zones(struct dm_target *ti,
struct dm_report_zones_args *args, unsigned int nr_zones)
 {
struct linear_c *lc = ti->private;
-   sector_t sector = linear_map_sector(ti, args->next_sector);
 
-   args->start = lc->start;
-   return blkdev_report_zones(lc->dev->bdev, sector, nr_zones,
-  dm_report_zones_cb, args);
+   return dm_report_zones(lc->dev->bdev, lc->start,
+  linear_map_sector(ti, args->next_sector),
+  args, nr_zones);
 }
 #else
 #define linear_report_zones NULL
diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
index 3243c42b7951..b42474043249 100644
--- a/drivers/md/dm-zone.c
+++ b/drivers/md/dm-zone.c
@@ -56,7 +56,8 @@ int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
return ret;
 }
 
-int dm_report_zones_cb(struct blk_zone *zone, unsigned int idx, void *data)
+static int dm_report_zones_cb(struct blk_zone *zone, unsigned int idx,
+ void *data)
 {
struct dm_report_zones_args *args = data;
sector_t sector_diff = args->tgt->begin - args->start;
@@ -84,7 +85,24 @@ int dm_report_zones_cb(struct blk_zone *zone, unsigned int 
idx, void *data)
args->next_sector = zone->start + zone->len;
return args->orig_cb(zone, args->zone_idx++, args->orig_data);
 }
-EXPORT_SYMBOL_GPL(dm_report_zones_cb);
+
+/*
+ * Helper for drivers of zoned targets to implement struct target_type
+ * report_zones operation.
+ */
+int dm_report_zones(struct block_device *bdev, sector_t start, sector_t sector,
+   struct dm_report_zones_args *args, unsigned int nr_zones)
+{
+   /*
+* Set the target mapping start sector first so that
+* dm_report_zones_cb() can correctly remap zone information.
+*/
+   args->start = start;
+
+   return blkdev_report_zones(bdev, sector, nr_zones,
+  dm_report_zones_cb, args);
+}
+EXPORT_SYMBOL_GPL(dm_report_zones);
 
 void dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q)
 {
@@ -99,4 +117,3 @@ void dm_set_zones_restrictions(struct dm_table *t, struct 
request_queue *q)
WARN_ON_ONCE(queue_is_mq(q

[dm-devel] [PATCH v4 09/11] dm: rearrange core declarations

2021-05-24 Thread Damien Le Moal
Move the definitions of struct dm_target_io, struct dm_io and of the
bits of the flags field of struct mapped_device from dm.c to dm-core.h
to make them usable from dm-zone.c.
For the same reason, declare dec_pending() in dm-core.h after renaming
it to dm_io_dec_pending(). And for symmetry of the function names,
introduce the inline helper dm_io_inc_pending() instead of directly
using atomic_inc() calls.

Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/dm-core.h | 52 ++
 drivers/md/dm.c  | 59 ++--
 2 files changed, 59 insertions(+), 52 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 5953ff2bd260..cfabc1c91f9f 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -116,6 +116,19 @@ struct mapped_device {
struct srcu_struct io_barrier;
 };
 
+/*
+ * Bits for the flags field of struct mapped_device.
+ */
+#define DMF_BLOCK_IO_FOR_SUSPEND 0
+#define DMF_SUSPENDED 1
+#define DMF_FROZEN 2
+#define DMF_FREEING 3
+#define DMF_DELETING 4
+#define DMF_NOFLUSH_SUSPENDING 5
+#define DMF_DEFERRED_REMOVE 6
+#define DMF_SUSPENDED_INTERNALLY 7
+#define DMF_POST_SUSPENDING 8
+
 void disable_discard(struct mapped_device *md);
 void disable_write_same(struct mapped_device *md);
 void disable_write_zeroes(struct mapped_device *md);
@@ -173,6 +186,45 @@ struct dm_table {
 #endif
 };
 
+/*
+ * One of these is allocated per clone bio.
+ */
+#define DM_TIO_MAGIC 7282014
+struct dm_target_io {
+   unsigned int magic;
+   struct dm_io *io;
+   struct dm_target *ti;
+   unsigned int target_bio_nr;
+   unsigned int *len_ptr;
+   bool inside_dm_io;
+   struct bio clone;
+};
+
+/*
+ * One of these is allocated per original bio.
+ * It contains the first clone used for that original.
+ */
+#define DM_IO_MAGIC 5191977
+struct dm_io {
+   unsigned int magic;
+   struct mapped_device *md;
+   blk_status_t status;
+   atomic_t io_count;
+   struct bio *orig_bio;
+   unsigned long start_time;
+   spinlock_t endio_lock;
+   struct dm_stats_aux stats_aux;
+   /* last member of dm_target_io is 'struct bio' */
+   struct dm_target_io tio;
+};
+
+static inline void dm_io_inc_pending(struct dm_io *io)
+{
+   atomic_inc(>io_count);
+}
+
+void dm_io_dec_pending(struct dm_io *io, blk_status_t error);
+
 static inline struct completion *dm_get_completion_from_kobject(struct kobject 
*kobj)
 {
return _of(kobj, struct dm_kobject_holder, kobj)->completion;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 4426019a89cc..563504163b74 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -74,38 +74,6 @@ struct clone_info {
unsigned sector_count;
 };
 
-/*
- * One of these is allocated per clone bio.
- */
-#define DM_TIO_MAGIC 7282014
-struct dm_target_io {
-   unsigned magic;
-   struct dm_io *io;
-   struct dm_target *ti;
-   unsigned target_bio_nr;
-   unsigned *len_ptr;
-   bool inside_dm_io;
-   struct bio clone;
-};
-
-/*
- * One of these is allocated per original bio.
- * It contains the first clone used for that original.
- */
-#define DM_IO_MAGIC 5191977
-struct dm_io {
-   unsigned magic;
-   struct mapped_device *md;
-   blk_status_t status;
-   atomic_t io_count;
-   struct bio *orig_bio;
-   unsigned long start_time;
-   spinlock_t endio_lock;
-   struct dm_stats_aux stats_aux;
-   /* last member of dm_target_io is 'struct bio' */
-   struct dm_target_io tio;
-};
-
 #define DM_TARGET_IO_BIO_OFFSET (offsetof(struct dm_target_io, clone))
 #define DM_IO_BIO_OFFSET \
(offsetof(struct dm_target_io, clone) + offsetof(struct dm_io, tio))
@@ -137,19 +105,6 @@ EXPORT_SYMBOL_GPL(dm_bio_get_target_bio_nr);
 
 #define MINOR_ALLOCED ((void *)-1)
 
-/*
- * Bits for the md->flags field.
- */
-#define DMF_BLOCK_IO_FOR_SUSPEND 0
-#define DMF_SUSPENDED 1
-#define DMF_FROZEN 2
-#define DMF_FREEING 3
-#define DMF_DELETING 4
-#define DMF_NOFLUSH_SUSPENDING 5
-#define DMF_DEFERRED_REMOVE 6
-#define DMF_SUSPENDED_INTERNALLY 7
-#define DMF_POST_SUSPENDING 8
-
 #define DM_NUMA_NODE NUMA_NO_NODE
 static int dm_numa_node = DM_NUMA_NODE;
 
@@ -825,7 +780,7 @@ static int __noflush_suspending(struct mapped_device *md)
  * Decrements the number of outstanding ios that a bio has been
  * cloned into, completing the original io if necc.
  */
-static void dec_pending(struct dm_io *io, blk_status_t error)
+void dm_io_dec_pending(struct dm_io *io, blk_status_t error)
 {
unsigned long flags;
blk_status_t io_error;
@@ -978,7 +933,7 @@ static void clone_endio(struct bio *bio)
}
 
free_tio(tio);
-   dec_pending(io, error);
+   dm_io_dec_pending(io, error);
 }
 
 /*
@@ -1247,7 +1202,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
 * anything, the target has assumed ownership of
 

[dm-devel] [PATCH v4 08/11] dm: Forbid requeue of writes to zones

2021-05-24 Thread Damien Le Moal
A target map method requesting the requeue of a bio with
DM_MAPIO_REQUEUE or completing it with DM_ENDIO_REQUEUE can cause
unaligned write errors if the bio is a write operation targeting a
sequential zone. If a zoned target request such a requeue, warn about
it and kill the IO.

The function dm_is_zone_write() is introduced to detect write operations
to zoned targets.

This change does not affect the target drivers supporting zoned devices
and exposing a zoned device, namely dm-crypt, dm-linear and dm-flakey as
none of these targets ever request a requeue.

Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/dm-zone.c | 17 +
 drivers/md/dm.c  | 18 +++---
 drivers/md/dm.h  |  5 +
 3 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
index b42474043249..edc3bbb45637 100644
--- a/drivers/md/dm-zone.c
+++ b/drivers/md/dm-zone.c
@@ -104,6 +104,23 @@ int dm_report_zones(struct block_device *bdev, sector_t 
start, sector_t sector,
 }
 EXPORT_SYMBOL_GPL(dm_report_zones);
 
+bool dm_is_zone_write(struct mapped_device *md, struct bio *bio)
+{
+   struct request_queue *q = md->queue;
+
+   if (!blk_queue_is_zoned(q))
+   return false;
+
+   switch (bio_op(bio)) {
+   case REQ_OP_WRITE_ZEROES:
+   case REQ_OP_WRITE_SAME:
+   case REQ_OP_WRITE:
+   return !op_is_flush(bio->bi_opf) && bio_sectors(bio);
+   default:
+   return false;
+   }
+}
+
 void dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q)
 {
if (!blk_queue_is_zoned(q))
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 45d2dc2ee844..4426019a89cc 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -846,11 +846,15 @@ static void dec_pending(struct dm_io *io, blk_status_t 
error)
 * Target requested pushing back the I/O.
 */
spin_lock_irqsave(>deferred_lock, flags);
-   if (__noflush_suspending(md))
+   if (__noflush_suspending(md) &&
+   !WARN_ON_ONCE(dm_is_zone_write(md, bio)))
/* NOTE early return due to BLK_STS_DM_REQUEUE 
below */
bio_list_add_head(>deferred, io->orig_bio);
else
-   /* noflush suspend was interrupted. */
+   /*
+* noflush suspend was interrupted or this is
+* a write to a zoned target.
+*/
io->status = BLK_STS_IOERR;
spin_unlock_irqrestore(>deferred_lock, flags);
}
@@ -947,7 +951,15 @@ static void clone_endio(struct bio *bio)
int r = endio(tio->ti, bio, );
switch (r) {
case DM_ENDIO_REQUEUE:
-   error = BLK_STS_DM_REQUEUE;
+   /*
+* Requeuing writes to a sequential zone of a zoned
+* target will break the sequential write pattern:
+* fail such IO.
+*/
+   if (WARN_ON_ONCE(dm_is_zone_write(md, bio)))
+   error = BLK_STS_IOERR;
+   else
+   error = BLK_STS_DM_REQUEUE;
fallthrough;
case DM_ENDIO_DONE:
break;
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index fdf1536a4b62..39c243258e24 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -107,8 +107,13 @@ void dm_set_zones_restrictions(struct dm_table *t, struct 
request_queue *q);
 #ifdef CONFIG_BLK_DEV_ZONED
 int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
unsigned int nr_zones, report_zones_cb cb, void *data);
+bool dm_is_zone_write(struct mapped_device *md, struct bio *bio);
 #else
 #define dm_blk_report_zonesNULL
+static inline bool dm_is_zone_write(struct mapped_device *md, struct bio *bio)
+{
+   return false;
+}
 #endif
 
 /*-
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v4 10/11] dm: introduce zone append emulation

2021-05-24 Thread Damien Le Moal
For zoned targets that cannot support zone append operations, implement
an emulation using regular write operations. If the original BIO
submitted by the user is a zone append operation, change its clone into
a regular write operation directed at the target zone write pointer
position.

To do so, an array of write pointer offsets (write pointer position
relative to the start of a zone) is added to struct mapped_device. All
operations that modify a sequential zone write pointer (writes, zone
reset, zone finish and zone append) are intersepted in __map_bio() and
processed using the new functions dm_zone_map_bio().

Detection of the target ability to natively support zone append
operations is done from dm_table_set_restrictions() by calling the
function dm_set_zones_restrictions(). A target that does not support
zone append operation, either by explicitly declaring it using the new
struct dm_target field zone_append_not_supported, or because the device
table contains a non-zoned device, has its mapped device marked with the
new flag DMF_ZONE_APPEND_EMULATED. The helper function
dm_emulate_zone_append() is introduced to test a mapped device for this
new flag.

Atomicity of the zones write pointer tracking and updates is done using
a zone write locking mechanism based on a bitmap. This is similar to
the block layer method but based on BIOs rather than struct request.
A zone write lock is taken in dm_zone_map_bio() for any clone BIO with
an operation type that changes the BIO target zone write pointer
position. The zone write lock is released if the clone BIO is failed
before submission or when dm_zone_endio() is called when the clone BIO
completes.

The zone write lock bitmap of the mapped device, together with a bitmap
indicating zone types (conv_zones_bitmap) and the write pointer offset
array (zwp_offset) are allocated and initialized with a full device zone
report in dm_set_zones_restrictions() using the function
dm_revalidate_zones().

For failed operations that may have modified a zone write pointer, the
zone write pointer offset is marked as invalid in dm_zone_endio().
Zones with an invalid write pointer offset are checked and the write
pointer updated using an internal report zone operation when the
faulty zone is accessed again by the user.

All functions added for this emulation have a minimal overhead for
zoned targets natively supporting zone append operations. Regular
device targets are also not affected. The added code also does not
impact builds with CONFIG_BLK_DEV_ZONED disabled by stubbing out all
dm zone related functions.

Signed-off-by: Damien Le Moal 
---
 drivers/md/dm-core.h  |  13 +
 drivers/md/dm-table.c |  19 +-
 drivers/md/dm-zone.c  | 580 --
 drivers/md/dm.c   |  38 ++-
 drivers/md/dm.h   |  16 +-
 include/linux/device-mapper.h |   6 +
 6 files changed, 618 insertions(+), 54 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index cfabc1c91f9f..edc1553c4eea 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -114,6 +114,11 @@ struct mapped_device {
bool init_tio_pdu:1;
 
struct srcu_struct io_barrier;
+
+#ifdef CONFIG_BLK_DEV_ZONED
+   unsigned int nr_zones;
+   unsigned int *zwp_offset;
+#endif
 };
 
 /*
@@ -128,6 +133,7 @@ struct mapped_device {
 #define DMF_DEFERRED_REMOVE 6
 #define DMF_SUSPENDED_INTERNALLY 7
 #define DMF_POST_SUSPENDING 8
+#define DMF_EMULATE_ZONE_APPEND 9
 
 void disable_discard(struct mapped_device *md);
 void disable_write_same(struct mapped_device *md);
@@ -143,6 +149,13 @@ static inline struct dm_stats *dm_get_stats(struct 
mapped_device *md)
return >stats;
 }
 
+static inline bool dm_emulate_zone_append(struct mapped_device *md)
+{
+   if (blk_queue_is_zoned(md->queue))
+   return test_bit(DMF_EMULATE_ZONE_APPEND, >flags);
+   return false;
+}
+
 #define DM_TABLE_MAX_DEPTH 16
 
 struct dm_table {
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index dd9f648ab598..21fdccfb16cf 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1981,11 +1981,12 @@ static int device_requires_stable_pages(struct 
dm_target *ti,
return blk_queue_stable_writes(q);
 }
 
-void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
-  struct queue_limits *limits)
+int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
+ struct queue_limits *limits)
 {
bool wc = false, fua = false;
int page_size = PAGE_SIZE;
+   int r;
 
/*
 * Copy table's limits to the DM device's request_queue
@@ -2064,12 +2065,20 @@ void dm_table_set_restrictions(struct dm_table *t, 
struct request_queue *q,
dm_table_any_dev_attr(t, device_is_not_random, NULL))
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
 
-   /* For a zoned target, setup the zone

[dm-devel] [PATCH v4 11/11] dm crypt: Fix zoned block device support

2021-05-24 Thread Damien Le Moal
Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
of the zone to be written instead of the actual sector location to
write. The write location is determined by the device and returned to
the host upon completion of the operation. This interface, while simple
and efficient for writing into sequential zones of a zoned block
device, is incompatible with the use of sector values to calculate a
cypher block IV. All data written in a zone end up using the same IV
values corresponding to the first sectors of the zone, but read
operation will specify any sector within the zone resulting in an IV
mismatch between encryption and decryption.

To solve this problem, report to DM core that zone append operations are
not supported. This result in the zone append operations being emulated
using regular write operations.

Reported-by: Shin'ichiro Kawasaki 
Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/dm-crypt.c | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index f410ceee51d7..50f4cbd600d5 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3280,14 +3280,28 @@ static int crypt_ctr(struct dm_target *ti, unsigned int 
argc, char **argv)
}
cc->start = tmpll;
 
-   /*
-* For zoned block devices, we need to preserve the issuer write
-* ordering. To do so, disable write workqueues and force inline
-* encryption completion.
-*/
if (bdev_is_zoned(cc->dev->bdev)) {
+   /*
+* For zoned block devices, we need to preserve the issuer write
+* ordering. To do so, disable write workqueues and force inline
+* encryption completion.
+*/
set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, >flags);
set_bit(DM_CRYPT_WRITE_INLINE, >flags);
+
+   /*
+* All zone append writes to a zone of a zoned block device will
+* have the same BIO sector, the start of the zone. When the
+* cypher IV mode uses sector values, all data targeting a
+* zone will be encrypted using the first sector numbers of the
+* zone. This will not result in write errors but will
+* cause most reads to fail as reads will use the sector values
+* for the actual data locations, resulting in IV mismatch.
+* To avoid this problem, ask DM core to emulate zone append
+* operations with regular writes.
+*/
+   DMDEBUG("Zone append operations will be emulated");
+   ti->emulate_zone_append = true;
}
 
if (crypt_integrity_aead(cc) || cc->integrity_iv_size) {
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v4 04/11] dm: Fix dm_accept_partial_bio()

2021-05-24 Thread Damien Le Moal
Fix dm_accept_partial_bio() to actually check that zone management
commands are not passed as explained in the function documentation
comment. Also, since a zone append operation cannot be split, add
REQ_OP_ZONE_APPEND as a forbidden command.

White lines are added around the group of BUG_ON() calls to make the
code more legible.

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/dm.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ca2aedd8ee7d..a9211575bfed 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1237,8 +1237,9 @@ static int dm_dax_zero_page_range(struct dax_device 
*dax_dev, pgoff_t pgoff,
 
 /*
  * A target may call dm_accept_partial_bio only from the map routine.  It is
- * allowed for all bio types except REQ_PREFLUSH, REQ_OP_ZONE_RESET,
- * REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH.
+ * allowed for all bio types except REQ_PREFLUSH, zone management operations
+ * (REQ_OP_ZONE_RESET, REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and
+ * REQ_OP_ZONE_FINISH) and zone append writes.
  *
  * dm_accept_partial_bio informs the dm that the target only wants to process
  * additional n_sectors sectors of the bio and the rest of the data should be
@@ -1268,9 +1269,13 @@ void dm_accept_partial_bio(struct bio *bio, unsigned 
n_sectors)
 {
struct dm_target_io *tio = container_of(bio, struct dm_target_io, 
clone);
unsigned bi_size = bio->bi_iter.bi_size >> SECTOR_SHIFT;
+
BUG_ON(bio->bi_opf & REQ_PREFLUSH);
+   BUG_ON(op_is_zone_mgmt(bio_op(bio)));
+   BUG_ON(bio_op(bio) == REQ_OP_ZONE_APPEND);
BUG_ON(bi_size > *tio->len_ptr);
BUG_ON(n_sectors > bi_size);
+
*tio->len_ptr -= bi_size - n_sectors;
bio->bi_iter.bi_size = n_sectors << SECTOR_SHIFT;
 }
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v4 06/11] dm: move zone related code to dm-zone.c

2021-05-24 Thread Damien Le Moal
Move core and table code used for zoned targets and conditionally
defined with #ifdef CONFIG_BLK_DEV_ZONED to the new file dm-zone.c.
This file is conditionally compiled depending on CONFIG_BLK_DEV_ZONED.
The small helper dm_set_zones_restrictions() is introduced to
initialize a mapped device request queue zone attributes in
dm_table_set_restrictions().

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/Makefile   |   4 ++
 drivers/md/dm-table.c |  14 ++
 drivers/md/dm-zone.c  | 102 ++
 drivers/md/dm.c   |  78 
 drivers/md/dm.h   |  11 +
 5 files changed, 120 insertions(+), 89 deletions(-)
 create mode 100644 drivers/md/dm-zone.c

diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index ef7ddc27685c..a74aaf8b1445 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -92,6 +92,10 @@ ifeq ($(CONFIG_DM_UEVENT),y)
 dm-mod-objs+= dm-uevent.o
 endif
 
+ifeq ($(CONFIG_BLK_DEV_ZONED),y)
+dm-mod-objs+= dm-zone.o
+endif
+
 ifeq ($(CONFIG_DM_VERITY_FEC),y)
 dm-verity-objs += dm-verity-fec.o
 endif
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 21fd9cd4da32..dd9f648ab598 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -2064,17 +2064,9 @@ void dm_table_set_restrictions(struct dm_table *t, 
struct request_queue *q,
dm_table_any_dev_attr(t, device_is_not_random, NULL))
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
 
-   /*
-* For a zoned target, the number of zones should be updated for the
-* correct value to be exposed in sysfs queue/nr_zones. For a BIO based
-* target, this is all that is needed.
-*/
-#ifdef CONFIG_BLK_DEV_ZONED
-   if (blk_queue_is_zoned(q)) {
-   WARN_ON_ONCE(queue_is_mq(q));
-   q->nr_zones = blkdev_nr_zones(t->md->disk);
-   }
-#endif
+   /* For a zoned target, setup the zones related queue attributes */
+   if (blk_queue_is_zoned(q))
+   dm_set_zones_restrictions(t, q);
 
dm_update_keyslot_manager(q, t);
blk_queue_update_readahead(q);
diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
new file mode 100644
index ..3243c42b7951
--- /dev/null
+++ b/drivers/md/dm-zone.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#include 
+
+#include "dm-core.h"
+
+/*
+ * User facing dm device block device report zone operation. This calls the
+ * report_zones operation for each target of a device table. This operation is
+ * generally implemented by targets using dm_report_zones().
+ */
+int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
+   unsigned int nr_zones, report_zones_cb cb, void *data)
+{
+   struct mapped_device *md = disk->private_data;
+   struct dm_table *map;
+   int srcu_idx, ret;
+   struct dm_report_zones_args args = {
+   .next_sector = sector,
+   .orig_data = data,
+   .orig_cb = cb,
+   };
+
+   if (dm_suspended_md(md))
+   return -EAGAIN;
+
+   map = dm_get_live_table(md, _idx);
+   if (!map) {
+   ret = -EIO;
+   goto out;
+   }
+
+   do {
+   struct dm_target *tgt;
+
+   tgt = dm_table_find_target(map, args.next_sector);
+   if (WARN_ON_ONCE(!tgt->type->report_zones)) {
+   ret = -EIO;
+   goto out;
+   }
+
+   args.tgt = tgt;
+   ret = tgt->type->report_zones(tgt, ,
+ nr_zones - args.zone_idx);
+   if (ret < 0)
+   goto out;
+   } while (args.zone_idx < nr_zones &&
+args.next_sector < get_capacity(disk));
+
+   ret = args.zone_idx;
+out:
+   dm_put_live_table(md, srcu_idx);
+   return ret;
+}
+
+int dm_report_zones_cb(struct blk_zone *zone, unsigned int idx, void *data)
+{
+   struct dm_report_zones_args *args = data;
+   sector_t sector_diff = args->tgt->begin - args->start;
+
+   /*
+* Ignore zones beyond the target range.
+*/
+   if (zone->start >= args->start + args->tgt->len)
+   return 0;
+
+   /*
+* Remap the start sector and write pointer position of the zone
+* to match its position in the target range.
+*/
+   zone->start += sector_diff;
+   if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL) {
+   if (zone->cond == BLK_ZONE_COND_FULL)
+   zone->wp = zone->start + zone->len;
+   else if (zone->cond == BLK_ZONE_COND_EMPTY)
+

[dm-devel] [PATCH v4 02/11] block: introduce bio zone helpers

2021-05-24 Thread Damien Le Moal
Introduce the helper functions bio_zone_no() and bio_zone_is_seq().
Both are the BIO counterparts of the request helpers blk_rq_zone_no()
and blk_rq_zone_is_seq(), respectively returning the number of the
target zone of a bio and true if the BIO target zone is sequential.

Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Chaitanya Kulkarni 
---
 include/linux/blkdev.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f69c75bd6d27..2db0f376f5d9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1008,6 +1008,18 @@ static inline unsigned int blk_rq_stats_sectors(const 
struct request *rq)
 /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */
 const char *blk_zone_cond_str(enum blk_zone_cond zone_cond);
 
+static inline unsigned int bio_zone_no(struct bio *bio)
+{
+   return blk_queue_zone_no(bdev_get_queue(bio->bi_bdev),
+bio->bi_iter.bi_sector);
+}
+
+static inline unsigned int bio_zone_is_seq(struct bio *bio)
+{
+   return blk_queue_zone_is_seq(bdev_get_queue(bio->bi_bdev),
+bio->bi_iter.bi_sector);
+}
+
 static inline unsigned int blk_rq_zone_no(struct request *rq)
 {
return blk_queue_zone_no(rq->q, blk_rq_pos(rq));
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v4 01/11] block: improve handling of all zones reset operation

2021-05-24 Thread Damien Le Moal
SCSI, ZNS and null_blk zoned devices support resetting all zones using
a single command (REQ_OP_ZONE_RESET_ALL), as indicated using the device
request queue flag QUEUE_FLAG_ZONE_RESETALL. This flag is not set for
device mapper targets creating zoned devices. In this case, a user
request for resetting all zones of a device is processed in
blkdev_zone_mgmt() by issuing a REQ_OP_ZONE_RESET operation for each
zone of the device. This leads to different behaviors of the
BLKRESETZONE ioctl() depending on the target device support for the
reset all operation. E.g.

blkzone reset /dev/sdX

will reset all zones of a SCSI device using a single command that will
ignore conventional, read-only or offline zones.

But a dm-linear device including conventional, read-only or offline
zones cannot be reset in the same manner as some of the single zone
reset operations issued by blkdev_zone_mgmt() will fail. E.g.:

blkzone reset /dev/dm-Y
blkzone: /dev/dm-0: BLKRESETZONE ioctl failed: Remote I/O error

To simplify applications and tools development, unify the behavior of
the all-zone reset operation by modifying blkdev_zone_mgmt() to not
issue a zone reset operation for conventional, read-only and offline
zones, thus mimicking what an actual reset-all device command does on a
device supporting REQ_OP_ZONE_RESET_ALL. This emulation is done using
the new function blkdev_zone_reset_all_emulated(). The zones needing a
reset are identified using a bitmap that is initialized using a zone
report. Since empty zones do not need a reset, also ignore these zones.
The function blkdev_zone_reset_all() is introduced for block devices
natively supporting reset all operations. blkdev_zone_mgmt() is modified
to call either function to execute an all zone reset request.

Signed-off-by: Damien Le Moal 
[hch: split into multiple functions]
Signed-off-by: Christoph Hellwig 
---
 block/blk-zoned.c | 119 +++---
 1 file changed, 92 insertions(+), 27 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 250cb76ee615..f47f688b6ea6 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -161,18 +161,89 @@ int blkdev_report_zones(struct block_device *bdev, 
sector_t sector,
 }
 EXPORT_SYMBOL_GPL(blkdev_report_zones);
 
-static inline bool blkdev_allow_reset_all_zones(struct block_device *bdev,
-   sector_t sector,
-   sector_t nr_sectors)
+static inline unsigned long *blk_alloc_zone_bitmap(int node,
+  unsigned int nr_zones)
 {
-   if (!blk_queue_zone_resetall(bdev_get_queue(bdev)))
-   return false;
+   return kcalloc_node(BITS_TO_LONGS(nr_zones), sizeof(unsigned long),
+   GFP_NOIO, node);
+}
 
+static int blk_zone_need_reset_cb(struct blk_zone *zone, unsigned int idx,
+ void *data)
+{
/*
-* REQ_OP_ZONE_RESET_ALL can be executed only if the number of sectors
-* of the applicable zone range is the entire disk.
+* For an all-zones reset, ignore conventional, empty, read-only
+* and offline zones.
 */
-   return !sector && nr_sectors == get_capacity(bdev->bd_disk);
+   switch (zone->cond) {
+   case BLK_ZONE_COND_NOT_WP:
+   case BLK_ZONE_COND_EMPTY:
+   case BLK_ZONE_COND_READONLY:
+   case BLK_ZONE_COND_OFFLINE:
+   return 0;
+   default:
+   set_bit(idx, (unsigned long *)data);
+   return 0;
+   }
+}
+
+static int blkdev_zone_reset_all_emulated(struct block_device *bdev,
+ gfp_t gfp_mask)
+{
+   struct request_queue *q = bdev_get_queue(bdev);
+   sector_t capacity = get_capacity(bdev->bd_disk);
+   sector_t zone_sectors = blk_queue_zone_sectors(q);
+   unsigned long *need_reset;
+   struct bio *bio = NULL;
+   sector_t sector =  0;
+   int ret;
+
+   need_reset = blk_alloc_zone_bitmap(q->node, q->nr_zones);
+   if (!need_reset)
+   return -ENOMEM;
+
+   ret = bdev->bd_disk->fops->report_zones(bdev->bd_disk, 0,
+   q->nr_zones, blk_zone_need_reset_cb,
+   need_reset);
+   if (ret < 0)
+   goto out_free_need_reset;
+
+   ret = 0;
+   while (sector < capacity) {
+   if (!test_bit(blk_queue_zone_no(q, sector), need_reset)) {
+   sector += zone_sectors;
+   continue;
+   }
+
+   bio = blk_next_bio(bio, 0, gfp_mask);
+   bio_set_dev(bio, bdev);
+   bio->bi_opf = REQ_OP_ZONE_RESET | REQ_SYNC;
+   bio->bi_iter.bi_sector = sector;
+   sector += zone_sectors;
+
+   /* This may take a while, so be nice to others */
+

[dm-devel] [PATCH v4 00/11] dm: Improve zoned block device support

2021-05-24 Thread Damien Le Moal
This series improve device mapper support for zoned block devices and
of targets exposing a zoned device.

The first patch improve support for user requests to reset all zones of
the target device. With the fix, such operation behave similarly to
physical block devices implementation based on the single zone reset
command with the ALL bit set.

The following 2 patches are preparatory block layer patches.

Patch 4 and 5 are 2 small fixes to DM core zoned block device support.

Patch 6 reorganizes DM core code, moving conditionally defined zoned
block device code into the new dm-zone.c file. This avoids sprinkly DM
with zone related code defined under an #ifdef CONFIG_BLK_DEV_ZONED.

Patch 7 improves DM zone report helper functions for target drivers.

Patch 8 fixes a potential problem with BIO requeue on zoned target.

Finally, patch 9 to 11 implement zone append emulation using regular
writes for target drivers that cannot natively support this BIO type.
The only target currently needing this emulation is dm-crypt. With this
change, a zoned dm-crypt device behaves exactly like a regular zoned
block device, correctly executing user zone append BIOs.

This series passes the following tests:
1) zonefs tests on top of dm-crypt with a zoned nullblk device
2) zonefs tests on top of dm-crypt+dm-linear with an SMR HDD
3) btrfs fstests on top of dm-crypt with zoned nullblk devices.

Comments are as always welcome.

Changes from v3:
* Fixed missing variable initialization in
  blkdev_zone_reset_all_emulated() in patch 1.
* Rebased on rc3
* Added reviewed-by tags

Changes from v2:
* Replace use of spinlock to protect the zone write pointer offset
  array in patch 11 with READ_ONCE/WRITE_ONCE as suggested by Hannes.
* Added reviewed-by tags

Changes from v1:
* Use Christoph proposed approach for patch 1 (split reset all
  processing into different functions)
* Changed helpers introduced in patch 2 to remove the request_queue
  argument
* Improve patch 3 commit message as suggested by Christoph (explaining
  that the flag is a special case that cannot use a REQ_XXX flag)
* Changed DMWARN() into DMDEBUG in patch 11 as suggested by Milan
* Added reviewed-by tags

Damien Le Moal (11):
  block: improve handling of all zones reset operation
  block: introduce bio zone helpers
  block: introduce BIO_ZONE_WRITE_LOCKED bio flag
  dm: Fix dm_accept_partial_bio()
  dm: cleanup device_area_is_invalid()
  dm: move zone related code to dm-zone.c
  dm: Introduce dm_report_zones()
  dm: Forbid requeue of writes to zones
  dm: rearrange core declarations
  dm: introduce zone append emulation
  dm crypt: Fix zoned block device support

 block/blk-zoned.c | 119 +--
 drivers/md/Makefile   |   4 +
 drivers/md/dm-core.h  |  65 
 drivers/md/dm-crypt.c |  31 +-
 drivers/md/dm-flakey.c|   7 +-
 drivers/md/dm-linear.c|   7 +-
 drivers/md/dm-table.c |  21 +-
 drivers/md/dm-zone.c  | 654 ++
 drivers/md/dm.c   | 202 +++
 drivers/md/dm.h   |  30 +-
 include/linux/blk_types.h |   1 +
 include/linux/blkdev.h|  12 +
 include/linux/device-mapper.h |   9 +-
 13 files changed, 955 insertions(+), 207 deletions(-)
 create mode 100644 drivers/md/dm-zone.c

-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v3 04/11] dm: Fix dm_accept_partial_bio()

2021-05-20 Thread Damien Le Moal
Fix dm_accept_partial_bio() to actually check that zone management
commands are not passed as explained in the function documentation
comment. Also, since a zone append operation cannot be split, add
REQ_OP_ZONE_APPEND as a forbidden command.

White lines are added around the group of BUG_ON() calls to make the
code more legible.

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/dm.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ca2aedd8ee7d..a9211575bfed 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1237,8 +1237,9 @@ static int dm_dax_zero_page_range(struct dax_device 
*dax_dev, pgoff_t pgoff,
 
 /*
  * A target may call dm_accept_partial_bio only from the map routine.  It is
- * allowed for all bio types except REQ_PREFLUSH, REQ_OP_ZONE_RESET,
- * REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH.
+ * allowed for all bio types except REQ_PREFLUSH, zone management operations
+ * (REQ_OP_ZONE_RESET, REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and
+ * REQ_OP_ZONE_FINISH) and zone append writes.
  *
  * dm_accept_partial_bio informs the dm that the target only wants to process
  * additional n_sectors sectors of the bio and the rest of the data should be
@@ -1268,9 +1269,13 @@ void dm_accept_partial_bio(struct bio *bio, unsigned 
n_sectors)
 {
struct dm_target_io *tio = container_of(bio, struct dm_target_io, 
clone);
unsigned bi_size = bio->bi_iter.bi_size >> SECTOR_SHIFT;
+
BUG_ON(bio->bi_opf & REQ_PREFLUSH);
+   BUG_ON(op_is_zone_mgmt(bio_op(bio)));
+   BUG_ON(bio_op(bio) == REQ_OP_ZONE_APPEND);
BUG_ON(bi_size > *tio->len_ptr);
BUG_ON(n_sectors > bi_size);
+
*tio->len_ptr -= bi_size - n_sectors;
bio->bi_iter.bi_size = n_sectors << SECTOR_SHIFT;
 }
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v3 05/11] dm: cleanup device_area_is_invalid()

2021-05-20 Thread Damien Le Moal
In device_area_is_invalid(), use bdev_is_zoned() instead of open
coding the test on the zoned model returned by bdev_zoned_model().

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/dm-table.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index ee47a332b462..21fd9cd4da32 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -249,7 +249,7 @@ static int device_area_is_invalid(struct dm_target *ti, 
struct dm_dev *dev,
 * If the target is mapped to zoned block device(s), check
 * that the zones are not partially mapped.
 */
-   if (bdev_zoned_model(bdev) != BLK_ZONED_NONE) {
+   if (bdev_is_zoned(bdev)) {
unsigned int zone_sectors = bdev_zone_sectors(bdev);
 
if (start & (zone_sectors - 1)) {
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v3 11/11] dm crypt: Fix zoned block device support

2021-05-20 Thread Damien Le Moal
Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
of the zone to be written instead of the actual sector location to
write. The write location is determined by the device and returned to
the host upon completion of the operation. This interface, while simple
and efficient for writing into sequential zones of a zoned block
device, is incompatible with the use of sector values to calculate a
cypher block IV. All data written in a zone end up using the same IV
values corresponding to the first sectors of the zone, but read
operation will specify any sector within the zone resulting in an IV
mismatch between encryption and decryption.

To solve this problem, report to DM core that zone append operations are
not supported. This result in the zone append operations being emulated
using regular write operations.

Reported-by: Shin'ichiro Kawasaki 
Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/dm-crypt.c | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index f410ceee51d7..50f4cbd600d5 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3280,14 +3280,28 @@ static int crypt_ctr(struct dm_target *ti, unsigned int 
argc, char **argv)
}
cc->start = tmpll;
 
-   /*
-* For zoned block devices, we need to preserve the issuer write
-* ordering. To do so, disable write workqueues and force inline
-* encryption completion.
-*/
if (bdev_is_zoned(cc->dev->bdev)) {
+   /*
+* For zoned block devices, we need to preserve the issuer write
+* ordering. To do so, disable write workqueues and force inline
+* encryption completion.
+*/
set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, >flags);
set_bit(DM_CRYPT_WRITE_INLINE, >flags);
+
+   /*
+* All zone append writes to a zone of a zoned block device will
+* have the same BIO sector, the start of the zone. When the
+* cypher IV mode uses sector values, all data targeting a
+* zone will be encrypted using the first sector numbers of the
+* zone. This will not result in write errors but will
+* cause most reads to fail as reads will use the sector values
+* for the actual data locations, resulting in IV mismatch.
+* To avoid this problem, ask DM core to emulate zone append
+* operations with regular writes.
+*/
+   DMDEBUG("Zone append operations will be emulated");
+   ti->emulate_zone_append = true;
}
 
if (crypt_integrity_aead(cc) || cc->integrity_iv_size) {
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v3 09/11] dm: rearrange core declarations

2021-05-20 Thread Damien Le Moal
Move the definitions of struct dm_target_io, struct dm_io and of the
bits of the flags field of struct mapped_device from dm.c to dm-core.h
to make them usable from dm-zone.c.
For the same reason, declare dec_pending() in dm-core.h after renaming
it to dm_io_dec_pending(). And for symmetry of the function names,
introduce the inline helper dm_io_inc_pending() instead of directly
using atomic_inc() calls.

Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/dm-core.h | 52 ++
 drivers/md/dm.c  | 59 ++--
 2 files changed, 59 insertions(+), 52 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 5953ff2bd260..cfabc1c91f9f 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -116,6 +116,19 @@ struct mapped_device {
struct srcu_struct io_barrier;
 };
 
+/*
+ * Bits for the flags field of struct mapped_device.
+ */
+#define DMF_BLOCK_IO_FOR_SUSPEND 0
+#define DMF_SUSPENDED 1
+#define DMF_FROZEN 2
+#define DMF_FREEING 3
+#define DMF_DELETING 4
+#define DMF_NOFLUSH_SUSPENDING 5
+#define DMF_DEFERRED_REMOVE 6
+#define DMF_SUSPENDED_INTERNALLY 7
+#define DMF_POST_SUSPENDING 8
+
 void disable_discard(struct mapped_device *md);
 void disable_write_same(struct mapped_device *md);
 void disable_write_zeroes(struct mapped_device *md);
@@ -173,6 +186,45 @@ struct dm_table {
 #endif
 };
 
+/*
+ * One of these is allocated per clone bio.
+ */
+#define DM_TIO_MAGIC 7282014
+struct dm_target_io {
+   unsigned int magic;
+   struct dm_io *io;
+   struct dm_target *ti;
+   unsigned int target_bio_nr;
+   unsigned int *len_ptr;
+   bool inside_dm_io;
+   struct bio clone;
+};
+
+/*
+ * One of these is allocated per original bio.
+ * It contains the first clone used for that original.
+ */
+#define DM_IO_MAGIC 5191977
+struct dm_io {
+   unsigned int magic;
+   struct mapped_device *md;
+   blk_status_t status;
+   atomic_t io_count;
+   struct bio *orig_bio;
+   unsigned long start_time;
+   spinlock_t endio_lock;
+   struct dm_stats_aux stats_aux;
+   /* last member of dm_target_io is 'struct bio' */
+   struct dm_target_io tio;
+};
+
+static inline void dm_io_inc_pending(struct dm_io *io)
+{
+   atomic_inc(>io_count);
+}
+
+void dm_io_dec_pending(struct dm_io *io, blk_status_t error);
+
 static inline struct completion *dm_get_completion_from_kobject(struct kobject 
*kobj)
 {
return _of(kobj, struct dm_kobject_holder, kobj)->completion;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 4426019a89cc..563504163b74 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -74,38 +74,6 @@ struct clone_info {
unsigned sector_count;
 };
 
-/*
- * One of these is allocated per clone bio.
- */
-#define DM_TIO_MAGIC 7282014
-struct dm_target_io {
-   unsigned magic;
-   struct dm_io *io;
-   struct dm_target *ti;
-   unsigned target_bio_nr;
-   unsigned *len_ptr;
-   bool inside_dm_io;
-   struct bio clone;
-};
-
-/*
- * One of these is allocated per original bio.
- * It contains the first clone used for that original.
- */
-#define DM_IO_MAGIC 5191977
-struct dm_io {
-   unsigned magic;
-   struct mapped_device *md;
-   blk_status_t status;
-   atomic_t io_count;
-   struct bio *orig_bio;
-   unsigned long start_time;
-   spinlock_t endio_lock;
-   struct dm_stats_aux stats_aux;
-   /* last member of dm_target_io is 'struct bio' */
-   struct dm_target_io tio;
-};
-
 #define DM_TARGET_IO_BIO_OFFSET (offsetof(struct dm_target_io, clone))
 #define DM_IO_BIO_OFFSET \
(offsetof(struct dm_target_io, clone) + offsetof(struct dm_io, tio))
@@ -137,19 +105,6 @@ EXPORT_SYMBOL_GPL(dm_bio_get_target_bio_nr);
 
 #define MINOR_ALLOCED ((void *)-1)
 
-/*
- * Bits for the md->flags field.
- */
-#define DMF_BLOCK_IO_FOR_SUSPEND 0
-#define DMF_SUSPENDED 1
-#define DMF_FROZEN 2
-#define DMF_FREEING 3
-#define DMF_DELETING 4
-#define DMF_NOFLUSH_SUSPENDING 5
-#define DMF_DEFERRED_REMOVE 6
-#define DMF_SUSPENDED_INTERNALLY 7
-#define DMF_POST_SUSPENDING 8
-
 #define DM_NUMA_NODE NUMA_NO_NODE
 static int dm_numa_node = DM_NUMA_NODE;
 
@@ -825,7 +780,7 @@ static int __noflush_suspending(struct mapped_device *md)
  * Decrements the number of outstanding ios that a bio has been
  * cloned into, completing the original io if necc.
  */
-static void dec_pending(struct dm_io *io, blk_status_t error)
+void dm_io_dec_pending(struct dm_io *io, blk_status_t error)
 {
unsigned long flags;
blk_status_t io_error;
@@ -978,7 +933,7 @@ static void clone_endio(struct bio *bio)
}
 
free_tio(tio);
-   dec_pending(io, error);
+   dm_io_dec_pending(io, error);
 }
 
 /*
@@ -1247,7 +1202,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
 * anything, the target has assumed ownership of
 

[dm-devel] [PATCH v3 10/11] dm: introduce zone append emulation

2021-05-20 Thread Damien Le Moal
For zoned targets that cannot support zone append operations, implement
an emulation using regular write operations. If the original BIO
submitted by the user is a zone append operation, change its clone into
a regular write operation directed at the target zone write pointer
position.

To do so, an array of write pointer offsets (write pointer position
relative to the start of a zone) is added to struct mapped_device. All
operations that modify a sequential zone write pointer (writes, zone
reset, zone finish and zone append) are intersepted in __map_bio() and
processed using the new functions dm_zone_map_bio().

Detection of the target ability to natively support zone append
operations is done from dm_table_set_restrictions() by calling the
function dm_set_zones_restrictions(). A target that does not support
zone append operation, either by explicitly declaring it using the new
struct dm_target field zone_append_not_supported, or because the device
table contains a non-zoned device, has its mapped device marked with the
new flag DMF_ZONE_APPEND_EMULATED. The helper function
dm_emulate_zone_append() is introduced to test a mapped device for this
new flag.

Atomicity of the zones write pointer tracking and updates is done using
a zone write locking mechanism based on a bitmap. This is similar to
the block layer method but based on BIOs rather than struct request.
A zone write lock is taken in dm_zone_map_bio() for any clone BIO with
an operation type that changes the BIO target zone write pointer
position. The zone write lock is released if the clone BIO is failed
before submission or when dm_zone_endio() is called when the clone BIO
completes.

The zone write lock bitmap of the mapped device, together with a bitmap
indicating zone types (conv_zones_bitmap) and the write pointer offset
array (zwp_offset) are allocated and initialized with a full device zone
report in dm_set_zones_restrictions() using the function
dm_revalidate_zones().

For failed operations that may have modified a zone write pointer, the
zone write pointer offset is marked as invalid in dm_zone_endio().
Zones with an invalid write pointer offset are checked and the write
pointer updated using an internal report zone operation when the
faulty zone is accessed again by the user.

All functions added for this emulation have a minimal overhead for
zoned targets natively supporting zone append operations. Regular
device targets are also not affected. The added code also does not
impact builds with CONFIG_BLK_DEV_ZONED disabled by stubbing out all
dm zone related functions.

Signed-off-by: Damien Le Moal 
---
 drivers/md/dm-core.h  |  13 +
 drivers/md/dm-table.c |  19 +-
 drivers/md/dm-zone.c  | 580 --
 drivers/md/dm.c   |  38 ++-
 drivers/md/dm.h   |  16 +-
 include/linux/device-mapper.h |   6 +
 6 files changed, 618 insertions(+), 54 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index cfabc1c91f9f..edc1553c4eea 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -114,6 +114,11 @@ struct mapped_device {
bool init_tio_pdu:1;
 
struct srcu_struct io_barrier;
+
+#ifdef CONFIG_BLK_DEV_ZONED
+   unsigned int nr_zones;
+   unsigned int *zwp_offset;
+#endif
 };
 
 /*
@@ -128,6 +133,7 @@ struct mapped_device {
 #define DMF_DEFERRED_REMOVE 6
 #define DMF_SUSPENDED_INTERNALLY 7
 #define DMF_POST_SUSPENDING 8
+#define DMF_EMULATE_ZONE_APPEND 9
 
 void disable_discard(struct mapped_device *md);
 void disable_write_same(struct mapped_device *md);
@@ -143,6 +149,13 @@ static inline struct dm_stats *dm_get_stats(struct 
mapped_device *md)
return >stats;
 }
 
+static inline bool dm_emulate_zone_append(struct mapped_device *md)
+{
+   if (blk_queue_is_zoned(md->queue))
+   return test_bit(DMF_EMULATE_ZONE_APPEND, >flags);
+   return false;
+}
+
 #define DM_TABLE_MAX_DEPTH 16
 
 struct dm_table {
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index dd9f648ab598..21fdccfb16cf 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1981,11 +1981,12 @@ static int device_requires_stable_pages(struct 
dm_target *ti,
return blk_queue_stable_writes(q);
 }
 
-void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
-  struct queue_limits *limits)
+int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
+ struct queue_limits *limits)
 {
bool wc = false, fua = false;
int page_size = PAGE_SIZE;
+   int r;
 
/*
 * Copy table's limits to the DM device's request_queue
@@ -2064,12 +2065,20 @@ void dm_table_set_restrictions(struct dm_table *t, 
struct request_queue *q,
dm_table_any_dev_attr(t, device_is_not_random, NULL))
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
 
-   /* For a zoned target, setup the zone

[dm-devel] [PATCH v3 01/11] block: improve handling of all zones reset operation

2021-05-20 Thread Damien Le Moal
SCSI, ZNS and null_blk zoned devices support resetting all zones using
a single command (REQ_OP_ZONE_RESET_ALL), as indicated using the device
request queue flag QUEUE_FLAG_ZONE_RESETALL. This flag is not set for
device mapper targets creating zoned devices. In this case, a user
request for resetting all zones of a device is processed in
blkdev_zone_mgmt() by issuing a REQ_OP_ZONE_RESET operation for each
zone of the device. This leads to different behaviors of the
BLKRESETZONE ioctl() depending on the target device support for the
reset all operation. E.g.

blkzone reset /dev/sdX

will reset all zones of a SCSI device using a single command that will
ignore conventional, read-only or offline zones.

But a dm-linear device including conventional, read-only or offline
zones cannot be reset in the same manner as some of the single zone
reset operations issued by blkdev_zone_mgmt() will fail. E.g.:

blkzone reset /dev/dm-Y
blkzone: /dev/dm-0: BLKRESETZONE ioctl failed: Remote I/O error

To simplify applications and tools development, unify the behavior of
the all-zone reset operation by modifying blkdev_zone_mgmt() to not
issue a zone reset operation for conventional, read-only and offline
zones, thus mimicking what an actual reset-all device command does on a
device supporting REQ_OP_ZONE_RESET_ALL. This emulation is done using
the new function blkdev_zone_reset_all_emulated(). The zones needing a
reset are identified using a bitmap that is initialized using a zone
report. Since empty zones do not need a reset, also ignore these zones.
The function blkdev_zone_reset_all() is introduced for block devices
natively supporting reset all operations. blkdev_zone_mgmt() is modified
to call either function to execute an all zone reset request.

Signed-off-by: Damien Le Moal 
[hch: split into multiple functions]
Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
---
 block/blk-zoned.c | 117 +++---
 1 file changed, 90 insertions(+), 27 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 250cb76ee615..0ded16b0f713 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -161,18 +161,87 @@ int blkdev_report_zones(struct block_device *bdev, 
sector_t sector,
 }
 EXPORT_SYMBOL_GPL(blkdev_report_zones);
 
-static inline bool blkdev_allow_reset_all_zones(struct block_device *bdev,
-   sector_t sector,
-   sector_t nr_sectors)
+static inline unsigned long *blk_alloc_zone_bitmap(int node,
+  unsigned int nr_zones)
 {
-   if (!blk_queue_zone_resetall(bdev_get_queue(bdev)))
-   return false;
+   return kcalloc_node(BITS_TO_LONGS(nr_zones), sizeof(unsigned long),
+   GFP_NOIO, node);
+}
 
+static int blk_zone_need_reset_cb(struct blk_zone *zone, unsigned int idx,
+ void *data)
+{
/*
-* REQ_OP_ZONE_RESET_ALL can be executed only if the number of sectors
-* of the applicable zone range is the entire disk.
+* For an all-zones reset, ignore conventional, empty, read-only
+* and offline zones.
 */
-   return !sector && nr_sectors == get_capacity(bdev->bd_disk);
+   switch (zone->cond) {
+   case BLK_ZONE_COND_NOT_WP:
+   case BLK_ZONE_COND_EMPTY:
+   case BLK_ZONE_COND_READONLY:
+   case BLK_ZONE_COND_OFFLINE:
+   return 0;
+   default:
+   set_bit(idx, (unsigned long *)data);
+   return 0;
+   }
+}
+
+static int blkdev_zone_reset_all_emulated(struct block_device *bdev,
+ gfp_t gfp_mask)
+{
+   struct request_queue *q = bdev_get_queue(bdev);
+   sector_t capacity = get_capacity(bdev->bd_disk);
+   sector_t zone_sectors = blk_queue_zone_sectors(q);
+   unsigned long *need_reset;
+   struct bio *bio = NULL;
+   sector_t sector;
+   int ret;
+
+   need_reset = blk_alloc_zone_bitmap(q->node, q->nr_zones);
+   if (!need_reset)
+   return -ENOMEM;
+
+   ret = bdev->bd_disk->fops->report_zones(bdev->bd_disk, 0,
+   q->nr_zones, blk_zone_need_reset_cb,
+   need_reset);
+   if (ret < 0)
+   goto out_free_need_reset;
+
+   ret = 0;
+   while (sector < capacity) {
+   if (!test_bit(blk_queue_zone_no(q, sector), need_reset)) {
+   sector += zone_sectors;
+   continue;
+   }
+   bio = blk_next_bio(bio, 0, gfp_mask);
+   bio_set_dev(bio, bdev);
+   bio->bi_opf = REQ_OP_ZONE_RESET | REQ_SYNC;
+   bio->bi_iter.bi_sector = sector;
+   sector += zone_sectors;
+
+   /* This may take a while, so

[dm-devel] [PATCH v3 02/11] block: introduce bio zone helpers

2021-05-20 Thread Damien Le Moal
Introduce the helper functions bio_zone_no() and bio_zone_is_seq().
Both are the BIO counterparts of the request helpers blk_rq_zone_no()
and blk_rq_zone_is_seq(), respectively returning the number of the
target zone of a bio and true if the BIO target zone is sequential.

Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
---
 include/linux/blkdev.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f69c75bd6d27..2db0f376f5d9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1008,6 +1008,18 @@ static inline unsigned int blk_rq_stats_sectors(const 
struct request *rq)
 /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */
 const char *blk_zone_cond_str(enum blk_zone_cond zone_cond);
 
+static inline unsigned int bio_zone_no(struct bio *bio)
+{
+   return blk_queue_zone_no(bdev_get_queue(bio->bi_bdev),
+bio->bi_iter.bi_sector);
+}
+
+static inline unsigned int bio_zone_is_seq(struct bio *bio)
+{
+   return blk_queue_zone_is_seq(bdev_get_queue(bio->bi_bdev),
+bio->bi_iter.bi_sector);
+}
+
 static inline unsigned int blk_rq_zone_no(struct request *rq)
 {
return blk_queue_zone_no(rq->q, blk_rq_pos(rq));
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v3 07/11] dm: Introduce dm_report_zones()

2021-05-20 Thread Damien Le Moal
To simplify the implementation of the report_zones operation of a zoned
target, introduce the function dm_report_zones() to set a target
mapping start sector in struct dm_report_zones_args and call
blkdev_report_zones(). This new function is exported and the report
zones callback function dm_report_zones_cb() is not.

dm-linear, dm-flakey and dm-crypt are modified to use dm_report_zones().

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/dm-crypt.c |  7 +++
 drivers/md/dm-flakey.c|  7 +++
 drivers/md/dm-linear.c|  7 +++
 drivers/md/dm-zone.c  | 23 ---
 include/linux/device-mapper.h |  3 ++-
 5 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index b0ab080f2567..f410ceee51d7 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3138,11 +3138,10 @@ static int crypt_report_zones(struct dm_target *ti,
struct dm_report_zones_args *args, unsigned int nr_zones)
 {
struct crypt_config *cc = ti->private;
-   sector_t sector = cc->start + dm_target_offset(ti, args->next_sector);
 
-   args->start = cc->start;
-   return blkdev_report_zones(cc->dev->bdev, sector, nr_zones,
-  dm_report_zones_cb, args);
+   return dm_report_zones(cc->dev->bdev, cc->start,
+   cc->start + dm_target_offset(ti, args->next_sector),
+   args, nr_zones);
 }
 #else
 #define crypt_report_zones NULL
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index b7fee9936f05..5877220c01ed 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -463,11 +463,10 @@ static int flakey_report_zones(struct dm_target *ti,
struct dm_report_zones_args *args, unsigned int nr_zones)
 {
struct flakey_c *fc = ti->private;
-   sector_t sector = flakey_map_sector(ti, args->next_sector);
 
-   args->start = fc->start;
-   return blkdev_report_zones(fc->dev->bdev, sector, nr_zones,
-  dm_report_zones_cb, args);
+   return dm_report_zones(fc->dev->bdev, fc->start,
+  flakey_map_sector(ti, args->next_sector),
+  args, nr_zones);
 }
 #else
 #define flakey_report_zones NULL
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 92db0f5e7f28..c91f1e2e2f65 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -140,11 +140,10 @@ static int linear_report_zones(struct dm_target *ti,
struct dm_report_zones_args *args, unsigned int nr_zones)
 {
struct linear_c *lc = ti->private;
-   sector_t sector = linear_map_sector(ti, args->next_sector);
 
-   args->start = lc->start;
-   return blkdev_report_zones(lc->dev->bdev, sector, nr_zones,
-  dm_report_zones_cb, args);
+   return dm_report_zones(lc->dev->bdev, lc->start,
+  linear_map_sector(ti, args->next_sector),
+  args, nr_zones);
 }
 #else
 #define linear_report_zones NULL
diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
index 3243c42b7951..b42474043249 100644
--- a/drivers/md/dm-zone.c
+++ b/drivers/md/dm-zone.c
@@ -56,7 +56,8 @@ int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
return ret;
 }
 
-int dm_report_zones_cb(struct blk_zone *zone, unsigned int idx, void *data)
+static int dm_report_zones_cb(struct blk_zone *zone, unsigned int idx,
+ void *data)
 {
struct dm_report_zones_args *args = data;
sector_t sector_diff = args->tgt->begin - args->start;
@@ -84,7 +85,24 @@ int dm_report_zones_cb(struct blk_zone *zone, unsigned int 
idx, void *data)
args->next_sector = zone->start + zone->len;
return args->orig_cb(zone, args->zone_idx++, args->orig_data);
 }
-EXPORT_SYMBOL_GPL(dm_report_zones_cb);
+
+/*
+ * Helper for drivers of zoned targets to implement struct target_type
+ * report_zones operation.
+ */
+int dm_report_zones(struct block_device *bdev, sector_t start, sector_t sector,
+   struct dm_report_zones_args *args, unsigned int nr_zones)
+{
+   /*
+* Set the target mapping start sector first so that
+* dm_report_zones_cb() can correctly remap zone information.
+*/
+   args->start = start;
+
+   return blkdev_report_zones(bdev, sector, nr_zones,
+  dm_report_zones_cb, args);
+}
+EXPORT_SYMBOL_GPL(dm_report_zones);
 
 void dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q)
 {
@@ -99,4 +117,3 @@ void dm_set_zones_restrictions(struct dm_table *t, struct 
request_queue *q)
WARN_ON_ONCE(queue_is_mq(q

[dm-devel] [PATCH v3 08/11] dm: Forbid requeue of writes to zones

2021-05-20 Thread Damien Le Moal
A target map method requesting the requeue of a bio with
DM_MAPIO_REQUEUE or completing it with DM_ENDIO_REQUEUE can cause
unaligned write errors if the bio is a write operation targeting a
sequential zone. If a zoned target request such a requeue, warn about
it and kill the IO.

The function dm_is_zone_write() is introduced to detect write operations
to zoned targets.

This change does not affect the target drivers supporting zoned devices
and exposing a zoned device, namely dm-crypt, dm-linear and dm-flakey as
none of these targets ever request a requeue.

Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/dm-zone.c | 17 +
 drivers/md/dm.c  | 18 +++---
 drivers/md/dm.h  |  5 +
 3 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
index b42474043249..edc3bbb45637 100644
--- a/drivers/md/dm-zone.c
+++ b/drivers/md/dm-zone.c
@@ -104,6 +104,23 @@ int dm_report_zones(struct block_device *bdev, sector_t 
start, sector_t sector,
 }
 EXPORT_SYMBOL_GPL(dm_report_zones);
 
+bool dm_is_zone_write(struct mapped_device *md, struct bio *bio)
+{
+   struct request_queue *q = md->queue;
+
+   if (!blk_queue_is_zoned(q))
+   return false;
+
+   switch (bio_op(bio)) {
+   case REQ_OP_WRITE_ZEROES:
+   case REQ_OP_WRITE_SAME:
+   case REQ_OP_WRITE:
+   return !op_is_flush(bio->bi_opf) && bio_sectors(bio);
+   default:
+   return false;
+   }
+}
+
 void dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q)
 {
if (!blk_queue_is_zoned(q))
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 45d2dc2ee844..4426019a89cc 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -846,11 +846,15 @@ static void dec_pending(struct dm_io *io, blk_status_t 
error)
 * Target requested pushing back the I/O.
 */
spin_lock_irqsave(>deferred_lock, flags);
-   if (__noflush_suspending(md))
+   if (__noflush_suspending(md) &&
+   !WARN_ON_ONCE(dm_is_zone_write(md, bio)))
/* NOTE early return due to BLK_STS_DM_REQUEUE 
below */
bio_list_add_head(>deferred, io->orig_bio);
else
-   /* noflush suspend was interrupted. */
+   /*
+* noflush suspend was interrupted or this is
+* a write to a zoned target.
+*/
io->status = BLK_STS_IOERR;
spin_unlock_irqrestore(>deferred_lock, flags);
}
@@ -947,7 +951,15 @@ static void clone_endio(struct bio *bio)
int r = endio(tio->ti, bio, );
switch (r) {
case DM_ENDIO_REQUEUE:
-   error = BLK_STS_DM_REQUEUE;
+   /*
+* Requeuing writes to a sequential zone of a zoned
+* target will break the sequential write pattern:
+* fail such IO.
+*/
+   if (WARN_ON_ONCE(dm_is_zone_write(md, bio)))
+   error = BLK_STS_IOERR;
+   else
+   error = BLK_STS_DM_REQUEUE;
fallthrough;
case DM_ENDIO_DONE:
break;
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index fdf1536a4b62..39c243258e24 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -107,8 +107,13 @@ void dm_set_zones_restrictions(struct dm_table *t, struct 
request_queue *q);
 #ifdef CONFIG_BLK_DEV_ZONED
 int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
unsigned int nr_zones, report_zones_cb cb, void *data);
+bool dm_is_zone_write(struct mapped_device *md, struct bio *bio);
 #else
 #define dm_blk_report_zonesNULL
+static inline bool dm_is_zone_write(struct mapped_device *md, struct bio *bio)
+{
+   return false;
+}
 #endif
 
 /*-
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v3 00/11] dm: Improve zoned block device support

2021-05-20 Thread Damien Le Moal
This series improve device mapper support for zoned block devices and
of targets exposing a zoned device.

The first patch improve support for user requests to reset all zones of
the target device. With the fix, such operation behave similarly to
physical block devices implementation based on the single zone reset
command with the ALL bit set.

The following 2 patches are preparatory block layer patches.

Patch 4 and 5 are 2 small fixes to DM core zoned block device support.

Patch 6 reorganizes DM core code, moving conditionally defined zoned
block device code into the new dm-zone.c file. This avoids sprinkly DM
with zone related code defined under an #ifdef CONFIG_BLK_DEV_ZONED.

Patch 7 improves DM zone report helper functions for target drivers.

Patch 8 fixes a potential problem with BIO requeue on zoned target.

Finally, patch 9 to 11 implement zone append emulation using regular
writes for target drivers that cannot natively support this BIO type.
The only target currently needing this emulation is dm-crypt. With this
change, a zoned dm-crypt device behaves exactly like a regular zoned
block device, correctly executing user zone append BIOs.

This series passes the following tests:
1) zonefs tests on top of dm-crypt with a zoned nullblk device
2) zonefs tests on top of dm-crypt+dm-linear with an SMR HDD
3) btrfs fstests on top of dm-crypt with zoned nullblk devices.

Comments are as always welcome.

Changes from v2:
* Replace use of spinlock to protect the zone write pointer offset
  array in patch 11 with READ_ONCE/WRITE_ONCE as suggested by Hannes.
* Added reviewed-by tags

Changes from v1:
* Use Christoph proposed approach for patch 1 (split reset all
  processing into different functions)
* Changed helpers introduced in patch 2 to remove the request_queue
  argument
* Improve patch 3 commit message as suggested by Christoph (explaining
  that the flag is a special case that cannot use a REQ_XXX flag)
* Changed DMWARN() into DMDEBUG in patch 11 as suggested by Milan
* Added reviewed-by tags

Damien Le Moal (11):
  block: improve handling of all zones reset operation
  block: introduce bio zone helpers
  block: introduce BIO_ZONE_WRITE_LOCKED bio flag
  dm: Fix dm_accept_partial_bio()
  dm: cleanup device_area_is_invalid()
  dm: move zone related code to dm-zone.c
  dm: Introduce dm_report_zones()
  dm: Forbid requeue of writes to zones
  dm: rearrange core declarations
  dm: introduce zone append emulation
  dm crypt: Fix zoned block device support

 block/blk-zoned.c | 117 --
 drivers/md/Makefile   |   4 +
 drivers/md/dm-core.h  |  65 
 drivers/md/dm-crypt.c |  31 +-
 drivers/md/dm-flakey.c|   7 +-
 drivers/md/dm-linear.c|   7 +-
 drivers/md/dm-table.c |  21 +-
 drivers/md/dm-zone.c  | 654 ++
 drivers/md/dm.c   | 202 +++
 drivers/md/dm.h   |  30 +-
 include/linux/blk_types.h |   1 +
 include/linux/blkdev.h|  12 +
 include/linux/device-mapper.h |   9 +-
 13 files changed, 953 insertions(+), 207 deletions(-)
 create mode 100644 drivers/md/dm-zone.c

-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v3 06/11] dm: move zone related code to dm-zone.c

2021-05-20 Thread Damien Le Moal
Move core and table code used for zoned targets and conditionally
defined with #ifdef CONFIG_BLK_DEV_ZONED to the new file dm-zone.c.
This file is conditionally compiled depending on CONFIG_BLK_DEV_ZONED.
The small helper dm_set_zones_restrictions() is introduced to
initialize a mapped device request queue zone attributes in
dm_table_set_restrictions().

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/Makefile   |   4 ++
 drivers/md/dm-table.c |  14 ++
 drivers/md/dm-zone.c  | 102 ++
 drivers/md/dm.c   |  78 
 drivers/md/dm.h   |  11 +
 5 files changed, 120 insertions(+), 89 deletions(-)
 create mode 100644 drivers/md/dm-zone.c

diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index ef7ddc27685c..a74aaf8b1445 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -92,6 +92,10 @@ ifeq ($(CONFIG_DM_UEVENT),y)
 dm-mod-objs+= dm-uevent.o
 endif
 
+ifeq ($(CONFIG_BLK_DEV_ZONED),y)
+dm-mod-objs+= dm-zone.o
+endif
+
 ifeq ($(CONFIG_DM_VERITY_FEC),y)
 dm-verity-objs += dm-verity-fec.o
 endif
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 21fd9cd4da32..dd9f648ab598 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -2064,17 +2064,9 @@ void dm_table_set_restrictions(struct dm_table *t, 
struct request_queue *q,
dm_table_any_dev_attr(t, device_is_not_random, NULL))
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
 
-   /*
-* For a zoned target, the number of zones should be updated for the
-* correct value to be exposed in sysfs queue/nr_zones. For a BIO based
-* target, this is all that is needed.
-*/
-#ifdef CONFIG_BLK_DEV_ZONED
-   if (blk_queue_is_zoned(q)) {
-   WARN_ON_ONCE(queue_is_mq(q));
-   q->nr_zones = blkdev_nr_zones(t->md->disk);
-   }
-#endif
+   /* For a zoned target, setup the zones related queue attributes */
+   if (blk_queue_is_zoned(q))
+   dm_set_zones_restrictions(t, q);
 
dm_update_keyslot_manager(q, t);
blk_queue_update_readahead(q);
diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
new file mode 100644
index ..3243c42b7951
--- /dev/null
+++ b/drivers/md/dm-zone.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#include 
+
+#include "dm-core.h"
+
+/*
+ * User facing dm device block device report zone operation. This calls the
+ * report_zones operation for each target of a device table. This operation is
+ * generally implemented by targets using dm_report_zones().
+ */
+int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
+   unsigned int nr_zones, report_zones_cb cb, void *data)
+{
+   struct mapped_device *md = disk->private_data;
+   struct dm_table *map;
+   int srcu_idx, ret;
+   struct dm_report_zones_args args = {
+   .next_sector = sector,
+   .orig_data = data,
+   .orig_cb = cb,
+   };
+
+   if (dm_suspended_md(md))
+   return -EAGAIN;
+
+   map = dm_get_live_table(md, _idx);
+   if (!map) {
+   ret = -EIO;
+   goto out;
+   }
+
+   do {
+   struct dm_target *tgt;
+
+   tgt = dm_table_find_target(map, args.next_sector);
+   if (WARN_ON_ONCE(!tgt->type->report_zones)) {
+   ret = -EIO;
+   goto out;
+   }
+
+   args.tgt = tgt;
+   ret = tgt->type->report_zones(tgt, ,
+ nr_zones - args.zone_idx);
+   if (ret < 0)
+   goto out;
+   } while (args.zone_idx < nr_zones &&
+args.next_sector < get_capacity(disk));
+
+   ret = args.zone_idx;
+out:
+   dm_put_live_table(md, srcu_idx);
+   return ret;
+}
+
+int dm_report_zones_cb(struct blk_zone *zone, unsigned int idx, void *data)
+{
+   struct dm_report_zones_args *args = data;
+   sector_t sector_diff = args->tgt->begin - args->start;
+
+   /*
+* Ignore zones beyond the target range.
+*/
+   if (zone->start >= args->start + args->tgt->len)
+   return 0;
+
+   /*
+* Remap the start sector and write pointer position of the zone
+* to match its position in the target range.
+*/
+   zone->start += sector_diff;
+   if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL) {
+   if (zone->cond == BLK_ZONE_COND_FULL)
+   zone->wp = zone->start + zone->len;
+   else if (zone->cond == BLK_ZONE_COND_EMPTY)
+

[dm-devel] [PATCH v3 03/11] block: introduce BIO_ZONE_WRITE_LOCKED bio flag

2021-05-20 Thread Damien Le Moal
Introduce the BIO flag BIO_ZONE_WRITE_LOCKED to indicate that a BIO owns
the write lock of the zone it is targeting. This is the counterpart of
the struct request flag RQF_ZONE_WRITE_LOCKED.

This new BIO flag is reserved for now for zone write locking control
for device mapper targets exposing a zoned block device. Since in this
case, the lock flag must not be propagated to the struct request that
will be used to process the BIO, a BIO private flag is used rather than
changing the RQF_ZONE_WRITE_LOCKED request flag into a common REQ_XXX
flag that could be used for both BIO and request. This avoids conflicts
down the stack with the block IO scheduler zone write locking
(in mq-deadline).

Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
---
 include/linux/blk_types.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index db026b6ec15a..e5cf12f102a2 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -304,6 +304,7 @@ enum {
BIO_CGROUP_ACCT,/* has been accounted to a cgroup */
BIO_TRACKED,/* set if bio goes through the rq_qos path */
BIO_REMAPPED,
+   BIO_ZONE_WRITE_LOCKED,  /* Owns a zoned device zone write lock */
BIO_FLAG_LAST
 };
 
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v2 10/11] dm: introduce zone append emulation

2021-05-20 Thread Damien Le Moal
On 2021/05/20 15:47, Hannes Reinecke wrote:
> On 5/20/21 8:25 AM, Damien Le Moal wrote:
>> On 2021/05/20 15:10, Hannes Reinecke wrote:
>> [...]
>>>> +/*
>>>> + * First phase of BIO mapping for targets with zone append emulation:
>>>> + * check all BIO that change a zone writer pointer and change zone
>>>> + * append operations into regular write operations.
>>>> + */
>>>> +static bool dm_zone_map_bio_begin(struct mapped_device *md,
>>>> +struct bio *orig_bio, struct bio *clone)
>>>> +{
>>>> +  sector_t zone_sectors = blk_queue_zone_sectors(md->queue);
>>>> +  unsigned int zno = bio_zone_no(orig_bio);
>>>> +  unsigned long flags;
>>>> +  bool good_io = false;
>>>> +
>>>> +  spin_lock_irqsave(>zwp_offset_lock, flags);
>>>> +
>>>> +  /*
>>>> +   * If the target zone is in an error state, recover by inspecting the
>>>> +   * zone to get its current write pointer position. Note that since the
>>>> +   * target zone is already locked, a BIO issuing context should never
>>>> +   * see the zone write in the DM_ZONE_UPDATING_WP_OFST state.
>>>> +   */
>>>> +  if (md->zwp_offset[zno] == DM_ZONE_INVALID_WP_OFST) {
>>>> +  unsigned int wp_offset;
>>>> +  int ret;
>>>> +
>>>> +  md->zwp_offset[zno] = DM_ZONE_UPDATING_WP_OFST;
>>>> +
>>>> +  spin_unlock_irqrestore(>zwp_offset_lock, flags);
>>>> +  ret = dm_update_zone_wp_offset(md, zno, _offset);
>>>> +  spin_lock_irqsave(>zwp_offset_lock, flags);
>>>> +
>>>> +  if (ret) {
>>>> +  md->zwp_offset[zno] = DM_ZONE_INVALID_WP_OFST;
>>>> +  goto out;
>>>> +  }
>>>> +  md->zwp_offset[zno] = wp_offset;
>>>> +  } else if (md->zwp_offset[zno] == DM_ZONE_UPDATING_WP_OFST) {
>>>> +  DMWARN_LIMIT("Invalid DM_ZONE_UPDATING_WP_OFST state");
>>>> +  goto out;
>>>> +  }
>>>> +
>>>> +  switch (bio_op(orig_bio)) {
>>>> +  case REQ_OP_WRITE_ZEROES:
>>>> +  case REQ_OP_WRITE_SAME:
>>>> +  case REQ_OP_WRITE:
>>>> +  break;
>>>> +  case REQ_OP_ZONE_RESET:
>>>> +  case REQ_OP_ZONE_FINISH:
>>>> +  goto good;
>>>> +  case REQ_OP_ZONE_APPEND:
>>>> +  /*
>>>> +   * Change zone append operations into a non-mergeable regular
>>>> +   * writes directed at the current write pointer position of the
>>>> +   * target zone.
>>>> +   */
>>>> +  clone->bi_opf = REQ_OP_WRITE | REQ_NOMERGE |
>>>> +  (orig_bio->bi_opf & (~REQ_OP_MASK));
>>>> +  clone->bi_iter.bi_sector =
>>>> +  orig_bio->bi_iter.bi_sector + md->zwp_offset[zno];
>>>> +  break;
>>>> +  default:
>>>> +  DMWARN_LIMIT("Invalid BIO operation");
>>>> +  goto out;
>>>> +  }
>>>> +
>>>> +  /* Cannot write to a full zone */
>>>> +  if (md->zwp_offset[zno] >= zone_sectors)
>>>> +  goto out;
>>>> +
>>>> +  /* Writes must be aligned to the zone write pointer */
>>>> +  if ((clone->bi_iter.bi_sector & (zone_sectors - 1)) != 
>>>> md->zwp_offset[zno])
>>>> +  goto out;
>>>> +
>>>> +good:
>>>> +  good_io = true;
>>>> +
>>>> +out:
>>>> +  spin_unlock_irqrestore(>zwp_offset_lock, flags);
>>>
>>> I'm not happy with the spinlock. Can't the same effect be achieved with
>>> some clever READ_ONCE()/WRITE_ONCE/cmpexch magic?
>>> Especially as you have a separate 'zone lock' mechanism ...
>>
>> hmmm... Let me see. Given that what the bio completion is relatively simple, 
>> it
>> may be possible. With more coffee, I amy be able to come up with something 
>> clever.
>>
> More coffee is always a good idea :-)
> I would look at killing the intermediate state UPDATING_WP_OFST and only 
> update the pointer on endio (or if it failed).
> That way we would need to update the pointer only once if we have a 
> final state, and don't need to do the double 

Re: [dm-devel] [PATCH v2 10/11] dm: introduce zone append emulation

2021-05-20 Thread Damien Le Moal
md->zwp_offset[zno] = 0;
>> +break;
>> +case REQ_OP_ZONE_FINISH:
>> +md->zwp_offset[zno] = blk_queue_zone_sectors(md->queue);
>> +break;
>> +case REQ_OP_WRITE_ZEROES:
>> +case REQ_OP_WRITE_SAME:
>> +case REQ_OP_WRITE:
>> +md->zwp_offset[zno] += nr_sectors;
>> +break;
>> +case REQ_OP_ZONE_APPEND:
>> +/*
>> + * Check that the target did not truncate the write operation
>> + * emulating a zone append.
>> + */
>> +if (nr_sectors != bio_sectors(orig_bio)) {
>> +DMWARN_LIMIT("Truncated write for zone append");
>> +sts = BLK_STS_IOERR;
>> +break;
>> +}
>> +md->zwp_offset[zno] += nr_sectors;
>> +break;
>> +default:
>> +DMWARN_LIMIT("Invalid BIO operation");
>> +sts = BLK_STS_IOERR;
>> +break;
>> +}
>> +
>> +spin_unlock_irqrestore(>zwp_offset_lock, flags);
> 
> You don't need the spinlock here; using WRITE_ONCE() should be sufficient.

If other references to zwp_offset use READ_ONCE(), no ?

[...]
>> +void dm_zone_endio(struct dm_io *io, struct bio *clone)
>> +{
>> +struct mapped_device *md = io->md;
>> +struct request_queue *q = md->queue;
>> +struct bio *orig_bio = io->orig_bio;
>> +unsigned long flags;
>> +unsigned int zno;
>> +
>> +/*
>> + * For targets that do not emulate zone append, we only need to
>> + * handle native zone-append bios.
>> + */
>> +if (!dm_emulate_zone_append(md)) {
>> +/*
>> + * Get the offset within the zone of the written sector
>> + * and add that to the original bio sector position.
>> + */
>> +if (clone->bi_status == BLK_STS_OK &&
>> +bio_op(clone) == REQ_OP_ZONE_APPEND) {
>> +sector_t mask = (sector_t)blk_queue_zone_sectors(q) - 1;
>> +
>> +orig_bio->bi_iter.bi_sector +=
>> +clone->bi_iter.bi_sector & mask;
>> +}
>> +
>> +return;
>> +}
>> +
>> +/*
>> + * For targets that do emulate zone append, if the clone BIO does not
>> + * own the target zone write lock, we have nothing to do.
>> + */
>> +if (!bio_flagged(clone, BIO_ZONE_WRITE_LOCKED))
>> +return;
>> +
>> +zno = bio_zone_no(orig_bio);
>> +
>> +spin_lock_irqsave(>zwp_offset_lock, flags);
>> +if (clone->bi_status != BLK_STS_OK) {
>> +    /*
>> + * BIOs that modify a zone write pointer may leave the zone
>> + * in an unknown state in case of failure (e.g. the write
>> + * pointer was only partially advanced). In this case, set
>> + * the target zone write pointer as invalid unless it is
>> + * already being updated.
>> + */
>> +if (md->zwp_offset[zno] != DM_ZONE_UPDATING_WP_OFST)
>> +md->zwp_offset[zno] = DM_ZONE_INVALID_WP_OFST;
>> +} else if (bio_op(orig_bio) == REQ_OP_ZONE_APPEND) {
>> +/*
>> + * Get the written sector for zone append operation that were
>> + * emulated using regular write operations.
>> + */
>> +if (WARN_ON_ONCE(md->zwp_offset[zno] < bio_sectors(orig_bio)))
>> +md->zwp_offset[zno] = DM_ZONE_INVALID_WP_OFST;
>> +else
>> +orig_bio->bi_iter.bi_sector +=
>> +md->zwp_offset[zno] - bio_sectors(orig_bio);
>> +}
>> +spin_unlock_irqrestore(>zwp_offset_lock, flags);
>> +
> 
> Similar comments to the spinlock here.

OK.

Thanks for the review.


-- 
Damien Le Moal
Western Digital Research



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v2 11/11] dm crypt: Fix zoned block device support

2021-05-19 Thread Damien Le Moal
Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
of the zone to be written instead of the actual sector location to
write. The write location is determined by the device and returned to
the host upon completion of the operation. This interface, while simple
and efficient for writing into sequential zones of a zoned block
device, is incompatible with the use of sector values to calculate a
cypher block IV. All data written in a zone end up using the same IV
values corresponding to the first sectors of the zone, but read
operation will specify any sector within the zone resulting in an IV
mismatch between encryption and decryption.

To solve this problem, report to DM core that zone append operations are
not supported. This result in the zone append operations being emulated
using regular write operations.

Reported-by: Shin'ichiro Kawasaki 
Signed-off-by: Damien Le Moal 
---
 drivers/md/dm-crypt.c | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index f410ceee51d7..50f4cbd600d5 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3280,14 +3280,28 @@ static int crypt_ctr(struct dm_target *ti, unsigned int 
argc, char **argv)
}
cc->start = tmpll;
 
-   /*
-* For zoned block devices, we need to preserve the issuer write
-* ordering. To do so, disable write workqueues and force inline
-* encryption completion.
-*/
if (bdev_is_zoned(cc->dev->bdev)) {
+   /*
+* For zoned block devices, we need to preserve the issuer write
+* ordering. To do so, disable write workqueues and force inline
+* encryption completion.
+*/
set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, >flags);
set_bit(DM_CRYPT_WRITE_INLINE, >flags);
+
+   /*
+* All zone append writes to a zone of a zoned block device will
+* have the same BIO sector, the start of the zone. When the
+* cypher IV mode uses sector values, all data targeting a
+* zone will be encrypted using the first sector numbers of the
+* zone. This will not result in write errors but will
+* cause most reads to fail as reads will use the sector values
+* for the actual data locations, resulting in IV mismatch.
+* To avoid this problem, ask DM core to emulate zone append
+* operations with regular writes.
+*/
+   DMDEBUG("Zone append operations will be emulated");
+   ti->emulate_zone_append = true;
}
 
if (crypt_integrity_aead(cc) || cc->integrity_iv_size) {
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v2 04/11] dm: Fix dm_accept_partial_bio()

2021-05-19 Thread Damien Le Moal
Fix dm_accept_partial_bio() to actually check that zone management
commands are not passed as explained in the function documentation
comment. Also, since a zone append operation cannot be split, add
REQ_OP_ZONE_APPEND as a forbidden command.

White lines are added around the group of BUG_ON() calls to make the
code more legible.

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
---
 drivers/md/dm.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ca2aedd8ee7d..a9211575bfed 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1237,8 +1237,9 @@ static int dm_dax_zero_page_range(struct dax_device 
*dax_dev, pgoff_t pgoff,
 
 /*
  * A target may call dm_accept_partial_bio only from the map routine.  It is
- * allowed for all bio types except REQ_PREFLUSH, REQ_OP_ZONE_RESET,
- * REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH.
+ * allowed for all bio types except REQ_PREFLUSH, zone management operations
+ * (REQ_OP_ZONE_RESET, REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and
+ * REQ_OP_ZONE_FINISH) and zone append writes.
  *
  * dm_accept_partial_bio informs the dm that the target only wants to process
  * additional n_sectors sectors of the bio and the rest of the data should be
@@ -1268,9 +1269,13 @@ void dm_accept_partial_bio(struct bio *bio, unsigned 
n_sectors)
 {
struct dm_target_io *tio = container_of(bio, struct dm_target_io, 
clone);
unsigned bi_size = bio->bi_iter.bi_size >> SECTOR_SHIFT;
+
BUG_ON(bio->bi_opf & REQ_PREFLUSH);
+   BUG_ON(op_is_zone_mgmt(bio_op(bio)));
+   BUG_ON(bio_op(bio) == REQ_OP_ZONE_APPEND);
BUG_ON(bi_size > *tio->len_ptr);
BUG_ON(n_sectors > bi_size);
+
*tio->len_ptr -= bi_size - n_sectors;
bio->bi_iter.bi_size = n_sectors << SECTOR_SHIFT;
 }
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v2 06/11] dm: move zone related code to dm-zone.c

2021-05-19 Thread Damien Le Moal
Move core and table code used for zoned targets and conditionally
defined with #ifdef CONFIG_BLK_DEV_ZONED to the new file dm-zone.c.
This file is conditionally compiled depending on CONFIG_BLK_DEV_ZONED.
The small helper dm_set_zones_restrictions() is introduced to
initialize a mapped device request queue zone attributes in
dm_table_set_restrictions().

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
---
 drivers/md/Makefile   |   4 ++
 drivers/md/dm-table.c |  14 ++
 drivers/md/dm-zone.c  | 102 ++
 drivers/md/dm.c   |  78 
 drivers/md/dm.h   |  11 +
 5 files changed, 120 insertions(+), 89 deletions(-)
 create mode 100644 drivers/md/dm-zone.c

diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index ef7ddc27685c..a74aaf8b1445 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -92,6 +92,10 @@ ifeq ($(CONFIG_DM_UEVENT),y)
 dm-mod-objs+= dm-uevent.o
 endif
 
+ifeq ($(CONFIG_BLK_DEV_ZONED),y)
+dm-mod-objs+= dm-zone.o
+endif
+
 ifeq ($(CONFIG_DM_VERITY_FEC),y)
 dm-verity-objs += dm-verity-fec.o
 endif
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 21fd9cd4da32..dd9f648ab598 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -2064,17 +2064,9 @@ void dm_table_set_restrictions(struct dm_table *t, 
struct request_queue *q,
dm_table_any_dev_attr(t, device_is_not_random, NULL))
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
 
-   /*
-* For a zoned target, the number of zones should be updated for the
-* correct value to be exposed in sysfs queue/nr_zones. For a BIO based
-* target, this is all that is needed.
-*/
-#ifdef CONFIG_BLK_DEV_ZONED
-   if (blk_queue_is_zoned(q)) {
-   WARN_ON_ONCE(queue_is_mq(q));
-   q->nr_zones = blkdev_nr_zones(t->md->disk);
-   }
-#endif
+   /* For a zoned target, setup the zones related queue attributes */
+   if (blk_queue_is_zoned(q))
+   dm_set_zones_restrictions(t, q);
 
dm_update_keyslot_manager(q, t);
blk_queue_update_readahead(q);
diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
new file mode 100644
index ..3243c42b7951
--- /dev/null
+++ b/drivers/md/dm-zone.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#include 
+
+#include "dm-core.h"
+
+/*
+ * User facing dm device block device report zone operation. This calls the
+ * report_zones operation for each target of a device table. This operation is
+ * generally implemented by targets using dm_report_zones().
+ */
+int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
+   unsigned int nr_zones, report_zones_cb cb, void *data)
+{
+   struct mapped_device *md = disk->private_data;
+   struct dm_table *map;
+   int srcu_idx, ret;
+   struct dm_report_zones_args args = {
+   .next_sector = sector,
+   .orig_data = data,
+   .orig_cb = cb,
+   };
+
+   if (dm_suspended_md(md))
+   return -EAGAIN;
+
+   map = dm_get_live_table(md, _idx);
+   if (!map) {
+   ret = -EIO;
+   goto out;
+   }
+
+   do {
+   struct dm_target *tgt;
+
+   tgt = dm_table_find_target(map, args.next_sector);
+   if (WARN_ON_ONCE(!tgt->type->report_zones)) {
+   ret = -EIO;
+   goto out;
+   }
+
+   args.tgt = tgt;
+   ret = tgt->type->report_zones(tgt, ,
+ nr_zones - args.zone_idx);
+   if (ret < 0)
+   goto out;
+   } while (args.zone_idx < nr_zones &&
+args.next_sector < get_capacity(disk));
+
+   ret = args.zone_idx;
+out:
+   dm_put_live_table(md, srcu_idx);
+   return ret;
+}
+
+int dm_report_zones_cb(struct blk_zone *zone, unsigned int idx, void *data)
+{
+   struct dm_report_zones_args *args = data;
+   sector_t sector_diff = args->tgt->begin - args->start;
+
+   /*
+* Ignore zones beyond the target range.
+*/
+   if (zone->start >= args->start + args->tgt->len)
+   return 0;
+
+   /*
+* Remap the start sector and write pointer position of the zone
+* to match its position in the target range.
+*/
+   zone->start += sector_diff;
+   if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL) {
+   if (zone->cond == BLK_ZONE_COND_FULL)
+   zone->wp = zone->start + zone->len;
+   else if (zone->cond == BLK_ZONE_COND_EMPTY)
+   

[dm-devel] [PATCH v2 02/11] block: introduce bio zone helpers

2021-05-19 Thread Damien Le Moal
Introduce the helper functions bio_zone_no() and bio_zone_is_seq().
Both are the BIO counterparts of the request helpers blk_rq_zone_no()
and blk_rq_zone_is_seq(), respectively returning the number of the
target zone of a bio and true if the BIO target zone is sequential.

Signed-off-by: Damien Le Moal 
---
 include/linux/blkdev.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f69c75bd6d27..2db0f376f5d9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1008,6 +1008,18 @@ static inline unsigned int blk_rq_stats_sectors(const 
struct request *rq)
 /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */
 const char *blk_zone_cond_str(enum blk_zone_cond zone_cond);
 
+static inline unsigned int bio_zone_no(struct bio *bio)
+{
+   return blk_queue_zone_no(bdev_get_queue(bio->bi_bdev),
+bio->bi_iter.bi_sector);
+}
+
+static inline unsigned int bio_zone_is_seq(struct bio *bio)
+{
+   return blk_queue_zone_is_seq(bdev_get_queue(bio->bi_bdev),
+bio->bi_iter.bi_sector);
+}
+
 static inline unsigned int blk_rq_zone_no(struct request *rq)
 {
return blk_queue_zone_no(rq->q, blk_rq_pos(rq));
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



  1   2   3   4   5   6   >