Re: [PATCH v2 2/5] block: add support for REQ_OP_WRITE_ZEROES

2016-11-18 Thread Keith Busch
On Thu, Nov 17, 2016 at 02:17:11PM -0800, Chaitanya Kulkarni wrote:
> From: Chaitanya Kulkarni 
> 
> This adds a new block layer operation to zero out a range of
> LBAs. This allows to implement zeroing for devices that don't use
> either discard with a predictable zero pattern or WRITE SAME of zeroes.
> The prominent example of that is NVMe with the Write Zeroes command,
> but in the future this should also help with improving the way
> zeroing discards work.
> 
> Signed-off-by: Chaitanya Kulkarni 
> ---

I think we also to assign queue_limits for stacked devices in
blk_stack_limits. Otherwise, looks good.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/5] block: add support for REQ_OP_WRITE_ZEROES

2016-11-18 Thread chaitany kulkarni
Sounds good, I'll update the whole series and resend it with v2 prefix.



On Thu, Nov 17, 2016 at 11:46 PM, Christoph Hellwig  wrote:
> On Thu, Nov 17, 2016 at 02:17:11PM -0800, Chaitanya Kulkarni wrote:
>> From: Chaitanya Kulkarni 
>>
>> This adds a new block layer operation to zero out a range of
>> LBAs. This allows to implement zeroing for devices that don't use
>> either discard with a predictable zero pattern or WRITE SAME of zeroes.
>> The prominent example of that is NVMe with the Write Zeroes command,
>> but in the future this should also help with improving the way
>> zeroing discards work.
>>
>> Signed-off-by: Chaitanya Kulkarni 
>
> I think you'll need to resend the whole series so that nvme can set
> the maximum discard sectors value.
>
>> @@ -575,9 +575,10 @@ static inline bool wbt_should_throttle(struct rq_wb 
>> *rwb, struct bio *bio)
>>   const int op = bio_op(bio);
>>
>>   /*
>> -  * If not a WRITE (or a discard), do nothing
>> +  * If not a WRITE (or a discard or write zeroes), do nothing
>>*/
>> - if (!(op == REQ_OP_WRITE || op == REQ_OP_DISCARD))
>> + if (!(op == REQ_OP_WRITE || op == REQ_OP_DISCARD ||
>> + op == REQ_OP_WRITE_ZEROES))
>>   return false;
>
> Jens: should we really throttle for discard or write zeroes here?
> Those aren't really writeback driven..
>
>> +static inline unsigned int bdev_write_zeroes(struct block_device *bdev)
>> +{
>> + struct request_queue *q = bdev_get_queue(bdev);
>> +
>> + if (q)
>> + return q->limits.max_write_zeroes_sectors;
>> +
>> + return 0;
>
> If this returns a sector value I'd name it bdev_write_zeroes_sectors.
>
> Otherwise this looks great.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/5] block: add support for REQ_OP_WRITE_ZEROES

2016-11-17 Thread Christoph Hellwig
On Thu, Nov 17, 2016 at 02:17:11PM -0800, Chaitanya Kulkarni wrote:
> From: Chaitanya Kulkarni 
> 
> This adds a new block layer operation to zero out a range of
> LBAs. This allows to implement zeroing for devices that don't use
> either discard with a predictable zero pattern or WRITE SAME of zeroes.
> The prominent example of that is NVMe with the Write Zeroes command,
> but in the future this should also help with improving the way
> zeroing discards work.
> 
> Signed-off-by: Chaitanya Kulkarni 

I think you'll need to resend the whole series so that nvme can set
the maximum discard sectors value.

> @@ -575,9 +575,10 @@ static inline bool wbt_should_throttle(struct rq_wb 
> *rwb, struct bio *bio)
>   const int op = bio_op(bio);
>  
>   /*
> -  * If not a WRITE (or a discard), do nothing
> +  * If not a WRITE (or a discard or write zeroes), do nothing
>*/
> - if (!(op == REQ_OP_WRITE || op == REQ_OP_DISCARD))
> + if (!(op == REQ_OP_WRITE || op == REQ_OP_DISCARD ||
> + op == REQ_OP_WRITE_ZEROES))
>   return false;

Jens: should we really throttle for discard or write zeroes here?
Those aren't really writeback driven..

> +static inline unsigned int bdev_write_zeroes(struct block_device *bdev)
> +{
> + struct request_queue *q = bdev_get_queue(bdev);
> +
> + if (q)
> + return q->limits.max_write_zeroes_sectors;
> +
> + return 0;

If this returns a sector value I'd name it bdev_write_zeroes_sectors.

Otherwise this looks great.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/5] block: add support for REQ_OP_WRITE_ZEROES

2016-11-17 Thread Martin K. Petersen
> "Chaitanya" == Chaitanya Kulkarni  writes:

Chaitanya> This adds a new block layer operation to zero out a range of
Chaitanya> LBAs. This allows to implement zeroing for devices that don't
Chaitanya> use either discard with a predictable zero pattern or WRITE
Chaitanya> SAME of zeroes.  The prominent example of that is NVMe with
Chaitanya> the Write Zeroes command, but in the future this should also
Chaitanya> help with improving the way zeroing discards work.

Looks good. Please also export the queue limit in blk-sysfs.c and create
a suitable entry in Documentation/ABI/testing/sysfs-block.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/5] block: add support for REQ_OP_WRITE_ZEROES

2016-11-17 Thread Chaitanya Kulkarni
From: Chaitanya Kulkarni 

This adds a new block layer operation to zero out a range of
LBAs. This allows to implement zeroing for devices that don't use
either discard with a predictable zero pattern or WRITE SAME of zeroes.
The prominent example of that is NVMe with the Write Zeroes command,
but in the future this should also help with improving the way
zeroing discards work.

Signed-off-by: Chaitanya Kulkarni 
---
 block/bio.c   |  1 +
 block/blk-core.c  |  4 
 block/blk-lib.c   | 58 +--
 block/blk-merge.c | 17 ++
 block/blk-settings.c  | 15 
 block/blk-wbt.c   |  5 ++--
 include/linux/bio.h   | 25 +++-
 include/linux/blk_types.h |  2 ++
 include/linux/blkdev.h| 19 
 9 files changed, 127 insertions(+), 19 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 2cf6eba..39fa10a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -670,6 +670,7 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t 
gfp_mask,
switch (bio_op(bio)) {
case REQ_OP_DISCARD:
case REQ_OP_SECURE_ERASE:
+   case REQ_OP_WRITE_ZEROES:
break;
case REQ_OP_WRITE_SAME:
bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0];
diff --git a/block/blk-core.c b/block/blk-core.c
index 473dd69..f1cb1b1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1945,6 +1945,10 @@ static inline int bio_check_eod(struct bio *bio, 
unsigned int nr_sectors)
if (!bdev_is_zoned(bio->bi_bdev))
goto not_supported;
break;
+   case REQ_OP_WRITE_ZEROES:
+   if (!bdev_write_zeroes(bio->bi_bdev))
+   goto not_supported;
+   break;
default:
break;
}
diff --git a/block/blk-lib.c b/block/blk-lib.c
index bfb28b0..bad64bb 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -227,6 +227,55 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
 EXPORT_SYMBOL(blkdev_issue_write_same);
 
 /**
+ * __blkdev_issue_write_zeroes - generate number of bios with WRITE ZEROES
+ * @bdev:  blockdev to issue
+ * @sector:start sector
+ * @nr_sects:  number of sectors to write
+ * @gfp_mask:  memory allocation flags (for bio_alloc)
+ * @biop:  pointer to anchor bio
+ *
+ * Description:
+ *  Generate and issue number of bios(REQ_OP_WRITE_ZEROES) with zerofiled 
pages.
+ */
+static int __blkdev_issue_write_zeroes(struct block_device *bdev,
+   sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
+   struct bio **biop)
+{
+   struct bio *bio = *biop;
+   unsigned int max_write_zeroes_sectors;
+   struct request_queue *q = bdev_get_queue(bdev);
+
+   if (!q)
+   return -ENXIO;
+
+   /* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
+   max_write_zeroes_sectors = bdev_write_zeroes(bdev);
+
+   if (max_write_zeroes_sectors == 0)
+   return -EOPNOTSUPP;
+
+   while (nr_sects) {
+   bio = next_bio(bio, 0, gfp_mask);
+   bio->bi_iter.bi_sector = sector;
+   bio->bi_bdev = bdev;
+   bio_set_op_attrs(bio, REQ_OP_WRITE_ZEROES, 0);
+
+   if (nr_sects > max_write_zeroes_sectors) {
+   bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
+   nr_sects -= max_write_zeroes_sectors;
+   sector += max_write_zeroes_sectors;
+   } else {
+   bio->bi_iter.bi_size = nr_sects << 9;
+   nr_sects = 0;
+   }
+   cond_resched();
+   }
+
+   *biop = bio;
+   return 0;
+}
+
+/**
  * __blkdev_issue_zeroout - generate number of zero filed write bios
  * @bdev:  blockdev to issue
  * @sector:start sector
@@ -259,6 +308,11 @@ int __blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
goto out;
}
 
+   ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask,
+   biop);
+   if (ret == 0 || (ret && ret != -EOPNOTSUPP))
+   goto out;
+
ret = __blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
ZERO_PAGE(0), biop);
if (ret == 0 || (ret && ret != -EOPNOTSUPP))
@@ -304,8 +358,8 @@ int __blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
  *  the discard request fail, if the discard flag is not set, or if
  *  discard_zeroes_data is not supported, this function will resort to
  *  zeroing the blocks manually, thus provisioning (allocating,
- *  anchoring) them. If the block device supports the WRITE SAME command
- *  blkdev_issue_zeroout() will use it to optimize the process of
+ *  anchoring) them. If the block