Re: [PATCH v2 2/5] block: add support for REQ_OP_WRITE_ZEROES
On Thu, Nov 17, 2016 at 02:17:11PM -0800, Chaitanya Kulkarni wrote: > From: Chaitanya Kulkarni> > This adds a new block layer operation to zero out a range of > LBAs. This allows to implement zeroing for devices that don't use > either discard with a predictable zero pattern or WRITE SAME of zeroes. > The prominent example of that is NVMe with the Write Zeroes command, > but in the future this should also help with improving the way > zeroing discards work. > > Signed-off-by: Chaitanya Kulkarni > --- I think we also to assign queue_limits for stacked devices in blk_stack_limits. Otherwise, looks good. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/5] block: add support for REQ_OP_WRITE_ZEROES
Sounds good, I'll update the whole series and resend it with v2 prefix. On Thu, Nov 17, 2016 at 11:46 PM, Christoph Hellwigwrote: > On Thu, Nov 17, 2016 at 02:17:11PM -0800, Chaitanya Kulkarni wrote: >> From: Chaitanya Kulkarni >> >> This adds a new block layer operation to zero out a range of >> LBAs. This allows to implement zeroing for devices that don't use >> either discard with a predictable zero pattern or WRITE SAME of zeroes. >> The prominent example of that is NVMe with the Write Zeroes command, >> but in the future this should also help with improving the way >> zeroing discards work. >> >> Signed-off-by: Chaitanya Kulkarni > > I think you'll need to resend the whole series so that nvme can set > the maximum discard sectors value. > >> @@ -575,9 +575,10 @@ static inline bool wbt_should_throttle(struct rq_wb >> *rwb, struct bio *bio) >> const int op = bio_op(bio); >> >> /* >> - * If not a WRITE (or a discard), do nothing >> + * If not a WRITE (or a discard or write zeroes), do nothing >>*/ >> - if (!(op == REQ_OP_WRITE || op == REQ_OP_DISCARD)) >> + if (!(op == REQ_OP_WRITE || op == REQ_OP_DISCARD || >> + op == REQ_OP_WRITE_ZEROES)) >> return false; > > Jens: should we really throttle for discard or write zeroes here? > Those aren't really writeback driven.. > >> +static inline unsigned int bdev_write_zeroes(struct block_device *bdev) >> +{ >> + struct request_queue *q = bdev_get_queue(bdev); >> + >> + if (q) >> + return q->limits.max_write_zeroes_sectors; >> + >> + return 0; > > If this returns a sector value I'd name it bdev_write_zeroes_sectors. > > Otherwise this looks great. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/5] block: add support for REQ_OP_WRITE_ZEROES
On Thu, Nov 17, 2016 at 02:17:11PM -0800, Chaitanya Kulkarni wrote: > From: Chaitanya Kulkarni> > This adds a new block layer operation to zero out a range of > LBAs. This allows to implement zeroing for devices that don't use > either discard with a predictable zero pattern or WRITE SAME of zeroes. > The prominent example of that is NVMe with the Write Zeroes command, > but in the future this should also help with improving the way > zeroing discards work. > > Signed-off-by: Chaitanya Kulkarni I think you'll need to resend the whole series so that nvme can set the maximum discard sectors value. > @@ -575,9 +575,10 @@ static inline bool wbt_should_throttle(struct rq_wb > *rwb, struct bio *bio) > const int op = bio_op(bio); > > /* > - * If not a WRITE (or a discard), do nothing > + * If not a WRITE (or a discard or write zeroes), do nothing >*/ > - if (!(op == REQ_OP_WRITE || op == REQ_OP_DISCARD)) > + if (!(op == REQ_OP_WRITE || op == REQ_OP_DISCARD || > + op == REQ_OP_WRITE_ZEROES)) > return false; Jens: should we really throttle for discard or write zeroes here? Those aren't really writeback driven.. > +static inline unsigned int bdev_write_zeroes(struct block_device *bdev) > +{ > + struct request_queue *q = bdev_get_queue(bdev); > + > + if (q) > + return q->limits.max_write_zeroes_sectors; > + > + return 0; If this returns a sector value I'd name it bdev_write_zeroes_sectors. Otherwise this looks great. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/5] block: add support for REQ_OP_WRITE_ZEROES
> "Chaitanya" == Chaitanya Kulkarniwrites: Chaitanya> This adds a new block layer operation to zero out a range of Chaitanya> LBAs. This allows to implement zeroing for devices that don't Chaitanya> use either discard with a predictable zero pattern or WRITE Chaitanya> SAME of zeroes. The prominent example of that is NVMe with Chaitanya> the Write Zeroes command, but in the future this should also Chaitanya> help with improving the way zeroing discards work. Looks good. Please also export the queue limit in blk-sysfs.c and create a suitable entry in Documentation/ABI/testing/sysfs-block. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/5] block: add support for REQ_OP_WRITE_ZEROES
From: Chaitanya KulkarniThis adds a new block layer operation to zero out a range of LBAs. This allows to implement zeroing for devices that don't use either discard with a predictable zero pattern or WRITE SAME of zeroes. The prominent example of that is NVMe with the Write Zeroes command, but in the future this should also help with improving the way zeroing discards work. Signed-off-by: Chaitanya Kulkarni --- block/bio.c | 1 + block/blk-core.c | 4 block/blk-lib.c | 58 +-- block/blk-merge.c | 17 ++ block/blk-settings.c | 15 block/blk-wbt.c | 5 ++-- include/linux/bio.h | 25 +++- include/linux/blk_types.h | 2 ++ include/linux/blkdev.h| 19 9 files changed, 127 insertions(+), 19 deletions(-) diff --git a/block/bio.c b/block/bio.c index 2cf6eba..39fa10a 100644 --- a/block/bio.c +++ b/block/bio.c @@ -670,6 +670,7 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask, switch (bio_op(bio)) { case REQ_OP_DISCARD: case REQ_OP_SECURE_ERASE: + case REQ_OP_WRITE_ZEROES: break; case REQ_OP_WRITE_SAME: bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0]; diff --git a/block/blk-core.c b/block/blk-core.c index 473dd69..f1cb1b1 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1945,6 +1945,10 @@ static inline int bio_check_eod(struct bio *bio, unsigned int nr_sectors) if (!bdev_is_zoned(bio->bi_bdev)) goto not_supported; break; + case REQ_OP_WRITE_ZEROES: + if (!bdev_write_zeroes(bio->bi_bdev)) + goto not_supported; + break; default: break; } diff --git a/block/blk-lib.c b/block/blk-lib.c index bfb28b0..bad64bb 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -227,6 +227,55 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, EXPORT_SYMBOL(blkdev_issue_write_same); /** + * __blkdev_issue_write_zeroes - generate number of bios with WRITE ZEROES + * @bdev: blockdev to issue + * @sector:start sector + * @nr_sects: number of sectors to write + * @gfp_mask: memory allocation flags (for bio_alloc) + * @biop: pointer to anchor bio + * + * Description: + * Generate and issue number of bios(REQ_OP_WRITE_ZEROES) with zerofiled pages. + */ +static int __blkdev_issue_write_zeroes(struct block_device *bdev, + sector_t sector, sector_t nr_sects, gfp_t gfp_mask, + struct bio **biop) +{ + struct bio *bio = *biop; + unsigned int max_write_zeroes_sectors; + struct request_queue *q = bdev_get_queue(bdev); + + if (!q) + return -ENXIO; + + /* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */ + max_write_zeroes_sectors = bdev_write_zeroes(bdev); + + if (max_write_zeroes_sectors == 0) + return -EOPNOTSUPP; + + while (nr_sects) { + bio = next_bio(bio, 0, gfp_mask); + bio->bi_iter.bi_sector = sector; + bio->bi_bdev = bdev; + bio_set_op_attrs(bio, REQ_OP_WRITE_ZEROES, 0); + + if (nr_sects > max_write_zeroes_sectors) { + bio->bi_iter.bi_size = max_write_zeroes_sectors << 9; + nr_sects -= max_write_zeroes_sectors; + sector += max_write_zeroes_sectors; + } else { + bio->bi_iter.bi_size = nr_sects << 9; + nr_sects = 0; + } + cond_resched(); + } + + *biop = bio; + return 0; +} + +/** * __blkdev_issue_zeroout - generate number of zero filed write bios * @bdev: blockdev to issue * @sector:start sector @@ -259,6 +308,11 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, goto out; } + ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask, + biop); + if (ret == 0 || (ret && ret != -EOPNOTSUPP)) + goto out; + ret = __blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, ZERO_PAGE(0), biop); if (ret == 0 || (ret && ret != -EOPNOTSUPP)) @@ -304,8 +358,8 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, * the discard request fail, if the discard flag is not set, or if * discard_zeroes_data is not supported, this function will resort to * zeroing the blocks manually, thus provisioning (allocating, - * anchoring) them. If the block device supports the WRITE SAME command - * blkdev_issue_zeroout() will use it to optimize the process of + * anchoring) them. If the block