[PATCH v5 11/11] Documentation: update notes in biovecs about arbitrarily sized bios
From: Dongsu Park Update block/biovecs.txt so that it includes a note on what kind of effects arbitrarily sized bios would bring to the block layer. Also fix a trivial typo, bio_iter_iovec. Cc: Christoph Hellwig Cc: Kent Overstreet Cc: Jonathan Corbet Cc: linux-...@vger.kernel.org Signed-off-by: Dongsu Park Signed-off-by: Ming Lin --- Documentation/block/biovecs.txt | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt index 74a32ad..2568958 100644 --- a/Documentation/block/biovecs.txt +++ b/Documentation/block/biovecs.txt @@ -24,7 +24,7 @@ particular, presenting the illusion of partially completed biovecs so that normal code doesn't have to deal with bi_bvec_done. * Driver code should no longer refer to biovecs directly; we now have - bio_iovec() and bio_iovec_iter() macros that return literal struct biovecs, + bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs, constructed from the raw biovecs but taking into account bi_bvec_done and bi_size. @@ -109,3 +109,11 @@ Other implications: over all the biovecs in the new bio - which is silly as it's not needed. So, don't use bi_vcnt anymore. + + * The current interface allows the block layer to split bios as needed, so we + could eliminate a lot of complexity particularly in stacked drivers. Code + that creates bios can then create whatever size bios are convenient, and + more importantly stacked drivers don't have to deal with both their own bio + size limitations and the limitations of the underlying devices. Thus + there's no need to define ->merge_bvec_fn() callbacks for individual block + drivers. -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 05/11] block: remove split code in blkdev_issue_discard
From: Ming Lin The split code in blkdev_issue_discard() can go away now that any driver that cares does the split. Signed-off-by: Ming Lin --- block/blk-lib.c | 73 +++-- 1 file changed, 14 insertions(+), 59 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..3bf3c4a 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -43,34 +43,17 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; - unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; struct blk_plug plug; - if (!q) + if (!q || !nr_sects) return -ENXIO; if (!blk_queue_discard(q)) return -EOPNOTSUPP; - /* Zero-sector (unknown) and one-sector granularities are the same. */ - granularity = max(q->limits.discard_granularity >> 9, 1U); - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; - - /* -* Ensure that max_discard_sectors is of the proper -* granularity, so that requests stay aligned after a split. -*/ - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); - max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } - if (flags & BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) return -EOPNOTSUPP; @@ -82,52 +65,24 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, bb.wait = blk_start_plug(); - while (nr_sects) { - unsigned int req_sects; - sector_t end_sect, tmp; - bio = bio_alloc(gfp_mask, 1); - if (!bio) { - ret = -ENOMEM; - break; - } + bio = bio_alloc(gfp_mask, 1); + if (!bio) { + ret = -ENOMEM; + goto out; + } - req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* -* If splitting a request, and the next starting sector would be -* misaligned, stop the discard at the previous aligned sector. -*/ - end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects < nr_sects && - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } + bio->bi_iter.bi_sector = sector; + bio->bi_end_io = bio_batch_end_io; + bio->bi_bdev = bdev; + bio->bi_private = - bio->bi_iter.bi_sector = sector; - bio->bi_end_io = bio_batch_end_io; - bio->bi_bdev = bdev; - bio->bi_private = + bio->bi_iter.bi_size = nr_sects << 9; - bio->bi_iter.bi_size = req_sects << 9; - nr_sects -= req_sects; - sector = end_sect; + atomic_inc(); + submit_bio(type, bio); - atomic_inc(); - submit_bio(type, bio); - - /* -* We can loop for a long time in here, if someone does -* full device discards (like mkfs). Be nice and allow -* us to schedule out to avoid softlocking if preempt -* is disabled. -*/ - cond_resched(); - } +out: blk_finish_plug(); /* Wait for bios in-flight */ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 10/11] block: remove bio_get_nr_vecs()
From: Kent Overstreet We can always fill up the bio now, no need to estimate the possible size based on queue parameters. Signed-off-by: Kent Overstreet [hch: rebased and wrote a changelog] Signed-off-by: Christoph Hellwig Signed-off-by: Ming Lin --- block/bio.c| 23 --- drivers/md/dm-io.c | 2 +- fs/btrfs/compression.c | 5 + fs/btrfs/extent_io.c | 9 ++--- fs/btrfs/inode.c | 3 +-- fs/btrfs/scrub.c | 18 ++ fs/direct-io.c | 2 +- fs/ext4/page-io.c | 3 +-- fs/ext4/readpage.c | 2 +- fs/f2fs/data.c | 2 +- fs/gfs2/lops.c | 9 + fs/logfs/dev_bdev.c| 4 ++-- fs/mpage.c | 4 ++-- fs/nilfs2/segbuf.c | 2 +- fs/xfs/xfs_aops.c | 3 +-- include/linux/bio.h| 1 - 16 files changed, 18 insertions(+), 74 deletions(-) diff --git a/block/bio.c b/block/bio.c index da15e9a..f28ca16 100644 --- a/block/bio.c +++ b/block/bio.c @@ -692,29 +692,6 @@ integrity_clone: EXPORT_SYMBOL(bio_clone_bioset); /** - * bio_get_nr_vecs - return approx number of vecs - * @bdev: I/O target - * - * Return the approximate number of pages we can send to this target. - * There's no guarantee that you will be able to fit this number of pages - * into a bio, it does not account for dynamic restrictions that vary - * on offset. - */ -int bio_get_nr_vecs(struct block_device *bdev) -{ - struct request_queue *q = bdev_get_queue(bdev); - int nr_pages; - - nr_pages = min_t(unsigned, -queue_max_segments(q), -queue_max_sectors(q) / (PAGE_SIZE >> 9) + 1); - - return min_t(unsigned, nr_pages, BIO_MAX_PAGES); - -} -EXPORT_SYMBOL(bio_get_nr_vecs); - -/** * bio_add_pc_page - attempt to add page to bio * @q: the target queue * @bio: destination bio diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c index 74adcd2..7d64272 100644 --- a/drivers/md/dm-io.c +++ b/drivers/md/dm-io.c @@ -314,7 +314,7 @@ static void do_region(int rw, unsigned region, struct dm_io_region *where, if ((rw & REQ_DISCARD) || (rw & REQ_WRITE_SAME)) num_bvecs = 1; else - num_bvecs = min_t(int, bio_get_nr_vecs(where->bdev), + num_bvecs = min_t(int, BIO_MAX_PAGES, dm_sector_div_up(remaining, (PAGE_SIZE >> SECTOR_SHIFT))); bio = bio_alloc_bioset(GFP_NOIO, num_bvecs, io->client->bios); diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index ce62324..449c752 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -97,10 +97,7 @@ static inline int compressed_bio_size(struct btrfs_root *root, static struct bio *compressed_bio_alloc(struct block_device *bdev, u64 first_byte, gfp_t gfp_flags) { - int nr_vecs; - - nr_vecs = bio_get_nr_vecs(bdev); - return btrfs_bio_alloc(bdev, first_byte >> 9, nr_vecs, gfp_flags); + return btrfs_bio_alloc(bdev, first_byte >> 9, BIO_MAX_PAGES, gfp_flags); } static int check_compressed_csum(struct inode *inode, diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 02d0581..ba89efd 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2802,9 +2802,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, { int ret = 0; struct bio *bio; - int nr; int contig = 0; - int this_compressed = bio_flags & EXTENT_BIO_COMPRESSED; int old_compressed = prev_bio_flags & EXTENT_BIO_COMPRESSED; size_t page_size = min_t(size_t, size, PAGE_CACHE_SIZE); @@ -2829,12 +2827,9 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, return 0; } } - if (this_compressed) - nr = BIO_MAX_PAGES; - else - nr = bio_get_nr_vecs(bdev); - bio = btrfs_bio_alloc(bdev, sector, nr, GFP_NOFS | __GFP_HIGH); + bio = btrfs_bio_alloc(bdev, sector, BIO_MAX_PAGES, + GFP_NOFS | __GFP_HIGH); if (!bio) return -ENOMEM; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 855935f..d66b9a3 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7959,8 +7959,7 @@ out: static struct bio *btrfs_dio_bio_alloc(struct block_device *bdev, u64 first_sector, gfp_t gfp_flags) { - int nr_vecs = bio_get_nr_vecs(bdev); - return btrfs_bio_alloc(bdev, first_sector, nr_vecs, gfp_flags); + return btrfs_bio_alloc(bdev, first_sector, BIO_MAX_PAGES, gfp_flags); } static inline int btrfs_lookup_and_bind_dio_csum(struct btrfs_root *root, diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 9f2feab..aab0b9a 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -454,27 +454,14 @@ struct
[PATCH v5 03/11] bcache: remove driver private bio splitting code
From: Kent Overstreet The bcache driver has always accepted arbitrarily large bios and split them internally. Now that every driver must accept arbitrarily large bios this code isn't nessecary anymore. Cc: linux-bca...@vger.kernel.org Signed-off-by: Kent Overstreet [dpark: add more description in commit message] Signed-off-by: Dongsu Park Signed-off-by: Ming Lin --- drivers/md/bcache/bcache.h| 18 drivers/md/bcache/io.c| 100 +- drivers/md/bcache/journal.c | 4 +- drivers/md/bcache/request.c | 16 +++ drivers/md/bcache/super.c | 32 +- drivers/md/bcache/util.h | 5 ++- drivers/md/bcache/writeback.c | 4 +- 7 files changed, 18 insertions(+), 161 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 04f7bc2..6b420a5 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -243,19 +243,6 @@ struct keybuf { DECLARE_ARRAY_ALLOCATOR(struct keybuf_key, freelist, KEYBUF_NR); }; -struct bio_split_pool { - struct bio_set *bio_split; - mempool_t *bio_split_hook; -}; - -struct bio_split_hook { - struct closure cl; - struct bio_split_pool *p; - struct bio *bio; - bio_end_io_t*bi_end_io; - void*bi_private; -}; - struct bcache_device { struct closure cl; @@ -288,8 +275,6 @@ struct bcache_device { int (*cache_miss)(struct btree *, struct search *, struct bio *, unsigned); int (*ioctl) (struct bcache_device *, fmode_t, unsigned, unsigned long); - - struct bio_split_pool bio_split_hook; }; struct io { @@ -454,8 +439,6 @@ struct cache { atomic_long_t meta_sectors_written; atomic_long_t btree_sectors_written; atomic_long_t sectors_written; - - struct bio_split_pool bio_split_hook; }; struct gc_stat { @@ -873,7 +856,6 @@ void bch_bbio_endio(struct cache_set *, struct bio *, int, const char *); void bch_bbio_free(struct bio *, struct cache_set *); struct bio *bch_bbio_alloc(struct cache_set *); -void bch_generic_make_request(struct bio *, struct bio_split_pool *); void __bch_submit_bbio(struct bio *, struct cache_set *); void bch_submit_bbio(struct bio *, struct cache_set *, struct bkey *, unsigned); diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c index cb64e64a..86a0bb8 100644 --- a/drivers/md/bcache/io.c +++ b/drivers/md/bcache/io.c @@ -11,104 +11,6 @@ #include -static unsigned bch_bio_max_sectors(struct bio *bio) -{ - struct request_queue *q = bdev_get_queue(bio->bi_bdev); - struct bio_vec bv; - struct bvec_iter iter; - unsigned ret = 0, seg = 0; - - if (bio->bi_rw & REQ_DISCARD) - return min(bio_sectors(bio), q->limits.max_discard_sectors); - - bio_for_each_segment(bv, bio, iter) { - struct bvec_merge_data bvm = { - .bi_bdev= bio->bi_bdev, - .bi_sector = bio->bi_iter.bi_sector, - .bi_size= ret << 9, - .bi_rw = bio->bi_rw, - }; - - if (seg == min_t(unsigned, BIO_MAX_PAGES, -queue_max_segments(q))) - break; - - if (q->merge_bvec_fn && - q->merge_bvec_fn(q, , ) < (int) bv.bv_len) - break; - - seg++; - ret += bv.bv_len >> 9; - } - - ret = min(ret, queue_max_sectors(q)); - - WARN_ON(!ret); - ret = max_t(int, ret, bio_iovec(bio).bv_len >> 9); - - return ret; -} - -static void bch_bio_submit_split_done(struct closure *cl) -{ - struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl); - - s->bio->bi_end_io = s->bi_end_io; - s->bio->bi_private = s->bi_private; - bio_endio(s->bio, 0); - - closure_debug_destroy(>cl); - mempool_free(s, s->p->bio_split_hook); -} - -static void bch_bio_submit_split_endio(struct bio *bio, int error) -{ - struct closure *cl = bio->bi_private; - struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl); - - if (error) - clear_bit(BIO_UPTODATE, >bio->bi_flags); - - bio_put(bio); - closure_put(cl); -} - -void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p) -{ - struct bio_split_hook *s; - struct bio *n; - - if (!bio_has_data(bio) && !(bio->bi_rw & REQ_DISCARD)) - goto submit; - - if (bio_sectors(bio) <= bch_bio_max_sectors(bio)) - goto submit; - - s = mempool_alloc(p->bio_split_hook, GFP_NOIO); - closure_init(>cl, NULL); - - s->bio = bio; - s->p= p; -
[PATCH v5 08/11] block: kill merge_bvec_fn() completely
From: Kent Overstreet As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe Cc: Lars Ellenberg Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina Cc: Yehuda Sadeh Cc: Sage Weil Cc: Alex Elder Cc: ceph-de...@vger.kernel.org Cc: Alasdair Kergon Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Neil Brown Cc: linux-r...@vger.kernel.org Cc: Christoph Hellwig Cc: "Martin K. Petersen" Acked-by: NeilBrown (for the 'md' bits) Signed-off-by: Kent Overstreet [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: Dongsu Park Signed-off-by: Ming Lin --- block/blk-merge.c | 17 +- block/blk-settings.c | 22 --- drivers/block/drbd/drbd_int.h | 1 - drivers/block/drbd/drbd_main.c | 1 - drivers/block/drbd/drbd_req.c | 35 drivers/block/pktcdvd.c| 21 --- drivers/block/rbd.c| 47 --- drivers/md/dm-cache-target.c | 21 --- drivers/md/dm-crypt.c | 16 -- drivers/md/dm-era-target.c | 15 - drivers/md/dm-flakey.c | 16 -- drivers/md/dm-linear.c | 16 -- drivers/md/dm-log-writes.c | 16 -- drivers/md/dm-raid.c | 19 -- drivers/md/dm-snap.c | 15 - drivers/md/dm-stripe.c | 21 --- drivers/md/dm-table.c | 8 --- drivers/md/dm-thin.c | 31 -- drivers/md/dm-verity.c | 16 -- drivers/md/dm.c| 127 + drivers/md/dm.h| 2 - drivers/md/linear.c| 43 -- drivers/md/md.c| 26 - drivers/md/md.h| 12 drivers/md/multipath.c | 21 --- drivers/md/raid0.c | 56 -- drivers/md/raid0.h | 2 - drivers/md/raid1.c | 58 +-- drivers/md/raid10.c| 121 +-- drivers/md/raid5.c | 32 --- include/linux/blkdev.h | 10 include/linux/device-mapper.h | 4 -- 32 files changed, 9 insertions(+), 859 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index 3707f30..1f5dfa0 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -69,24 +69,13 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, struct bio *split; struct bio_vec bv, bvprv; struct bvec_iter iter; - unsigned seg_size = 0, nsegs = 0; + unsigned seg_size = 0, nsegs = 0, sectors = 0; int prev = 0; - struct bvec_merge_data bvm = { - .bi_bdev= bio->bi_bdev, - .bi_sector = bio->bi_iter.bi_sector, - .bi_size= 0, - .bi_rw = bio->bi_rw, - }; - bio_for_each_segment(bv, bio, iter) { - if (q->merge_bvec_fn && - q->merge_bvec_fn(q, , ) < (int) bv.bv_len) - goto split; - - bvm.bi_size += bv.bv_len; + sectors += bv.bv_len >> 9; - if (bvm.bi_size >> 9 > queue_max_sectors(q)) + if (sectors > queue_max_sectors(q)) goto split; /* diff --git a/block/blk-settings.c b/block/blk-settings.c index 12600bf..e90d477 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -53,28 +53,6 @@ void blk_queue_unprep_rq(struct request_queue *q, unprep_rq_fn *ufn) } EXPORT_SYMBOL(blk_queue_unprep_rq); -/** - * blk_queue_merge_bvec - set a merge_bvec function for queue - * @q: queue - * @mbfn: merge_bvec_fn - * - * Usually queues have static limitations on the max sectors or segments that - * we can put in a request. Stacking drivers may have some settings that - * are dynamic, and thus we have to query the queue whether it is ok to - * add a new bio_vec to a bio at a given offset or not. If the block device - * has such limitations, it needs to register a merge_bvec_fn to control - * the size of bio's sent to it. Note that a block device *must* allow a - * single page to be added to an empty bio. The block device driver may want - * to use the bio_split() function to deal with these bio's. By default - * no merge_bvec_fn is defined for a queue, and only the fixed limits are - * honored. - */ -void blk_queue_merge_bvec(struct request_queue *q, merge_bvec_fn *mbfn) -{ - q->merge_bvec_fn = mbfn; -} -EXPORT_SYMBOL(blk_queue_merge_bvec); - void blk_queue_softirq_done(struct request_queue *q, softirq_done_fn *fn) { q->softirq_done_fn = fn; diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h index efd19c2..7ac66f3 100644 --- a/drivers/block/drbd/drbd_int.h
[PATCH v5 07/11] md/raid5: get rid of bio_fits_rdev()
From: Kent Overstreet Remove bio_fits_rdev() as sufficient merge_bvec_fn() handling is now performed by blk_queue_split() in md_make_request(). Cc: Neil Brown Cc: linux-r...@vger.kernel.org Acked-by: NeilBrown Signed-off-by: Kent Overstreet [dpark: add more description in commit message] Signed-off-by: Dongsu Park Signed-off-by: Ming Lin --- drivers/md/raid5.c | 23 +-- 1 file changed, 1 insertion(+), 22 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 8377e72..8bdf81a 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4780,25 +4780,6 @@ static void raid5_align_endio(struct bio *bi, int error) add_bio_to_retry(raid_bi, conf); } -static int bio_fits_rdev(struct bio *bi) -{ - struct request_queue *q = bdev_get_queue(bi->bi_bdev); - - if (bio_sectors(bi) > queue_max_sectors(q)) - return 0; - blk_recount_segments(q, bi); - if (bi->bi_phys_segments > queue_max_segments(q)) - return 0; - - if (q->merge_bvec_fn) - /* it's too hard to apply the merge_bvec_fn at this stage, -* just just give up -*/ - return 0; - - return 1; -} - static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio) { struct r5conf *conf = mddev->private; @@ -4852,11 +4833,9 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio) align_bi->bi_bdev = rdev->bdev; __clear_bit(BIO_SEG_VALID, _bi->bi_flags); - if (!bio_fits_rdev(align_bi) || - is_badblock(rdev, align_bi->bi_iter.bi_sector, + if (is_badblock(rdev, align_bi->bi_iter.bi_sector, bio_sectors(align_bi), _bad, _sectors)) { - /* too big in some way, or has a known bad block */ bio_put(align_bi); rdev_dec_pending(rdev, mddev); return 0; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 02/11] block: simplify bio_add_page()
From: Kent Overstreet Since generic_make_request() can now handle arbitrary size bios, all we have to do is make sure the bvec array doesn't overflow. __bio_add_page() doesn't need to call ->merge_bvec_fn(), where we can get rid of unnecessary code paths. Removing the call to ->merge_bvec_fn() is also fine, as no driver that implements support for BLOCK_PC commands even has a ->merge_bvec_fn() method. Cc: Christoph Hellwig Cc: Jens Axboe Signed-off-by: Kent Overstreet [dpark: rebase and resolve merge conflicts, change a couple of comments, make bio_add_page() warn once upon a cloned bio.] Signed-off-by: Dongsu Park Signed-off-by: Ming Lin --- block/bio.c | 135 +--- 1 file changed, 55 insertions(+), 80 deletions(-) diff --git a/block/bio.c b/block/bio.c index 2a00d34..da15e9a 100644 --- a/block/bio.c +++ b/block/bio.c @@ -714,9 +714,23 @@ int bio_get_nr_vecs(struct block_device *bdev) } EXPORT_SYMBOL(bio_get_nr_vecs); -static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page - *page, unsigned int len, unsigned int offset, - unsigned int max_sectors) +/** + * bio_add_pc_page - attempt to add page to bio + * @q: the target queue + * @bio: destination bio + * @page: page to add + * @len: vec entry length + * @offset: vec entry offset + * + * Attempt to add a page to the bio_vec maplist. This can fail for a + * number of reasons, such as the bio being full or target block device + * limitations. The target block device must allow bio's up to PAGE_SIZE, + * so it is always possible to add a single page to an empty bio. + * + * This should only be used by REQ_PC bios. + */ +int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page + *page, unsigned int len, unsigned int offset) { int retried_segments = 0; struct bio_vec *bvec; @@ -727,7 +741,7 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page if (unlikely(bio_flagged(bio, BIO_CLONED))) return 0; - if (((bio->bi_iter.bi_size + len) >> 9) > max_sectors) + if (((bio->bi_iter.bi_size + len) >> 9) > queue_max_hw_sectors(q)) return 0; /* @@ -740,28 +754,7 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page if (page == prev->bv_page && offset == prev->bv_offset + prev->bv_len) { - unsigned int prev_bv_len = prev->bv_len; prev->bv_len += len; - - if (q->merge_bvec_fn) { - struct bvec_merge_data bvm = { - /* prev_bvec is already charged in - bi_size, discharge it in order to - simulate merging updated prev_bvec - as new bvec. */ - .bi_bdev = bio->bi_bdev, - .bi_sector = bio->bi_iter.bi_sector, - .bi_size = bio->bi_iter.bi_size - - prev_bv_len, - .bi_rw = bio->bi_rw, - }; - - if (q->merge_bvec_fn(q, , prev) < prev->bv_len) { - prev->bv_len -= len; - return 0; - } - } - bio->bi_iter.bi_size += len; goto done; } @@ -804,27 +797,6 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page blk_recount_segments(q, bio); } - /* -* if queue has other restrictions (eg varying max sector size -* depending on offset), it can specify a merge_bvec_fn in the -* queue to get further control -*/ - if (q->merge_bvec_fn) { - struct bvec_merge_data bvm = { - .bi_bdev = bio->bi_bdev, - .bi_sector = bio->bi_iter.bi_sector, - .bi_size = bio->bi_iter.bi_size - len, - .bi_rw = bio->bi_rw, - }; - - /* -* merge_bvec_fn() returns number of bytes it can accept -* at this offset -*/ - if (q->merge_bvec_fn(q, , bvec) < bvec->bv_len) - goto failed; - } - /* If we may be able to merge these biovecs, force a recount */ if (bio->bi_vcnt > 1 && (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec))) bio->bi_flags &= ~(1 << BIO_SEG_VALID); @@ -841,28 +813,6 @@ static int
[PATCH v5 04/11] btrfs: remove bio splitting and merge_bvec_fn() calls
From: Kent Overstreet Btrfs has been doing bio splitting from btrfs_map_bio(), by checking device limits as well as calling ->merge_bvec_fn() etc. That is not necessary any more, because generic_make_request() is now able to handle arbitrarily sized bios. So clean up unnecessary code paths. Cc: Chris Mason Cc: Josef Bacik Cc: linux-bt...@vger.kernel.org Signed-off-by: Kent Overstreet Signed-off-by: Chris Mason [dpark: add more description in commit message] Signed-off-by: Dongsu Park Signed-off-by: Ming Lin --- fs/btrfs/volumes.c | 72 -- 1 file changed, 72 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4b438b4..fd25b81 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5833,34 +5833,6 @@ static noinline void btrfs_schedule_bio(struct btrfs_root *root, >work); } -static int bio_size_ok(struct block_device *bdev, struct bio *bio, - sector_t sector) -{ - struct bio_vec *prev; - struct request_queue *q = bdev_get_queue(bdev); - unsigned int max_sectors = queue_max_sectors(q); - struct bvec_merge_data bvm = { - .bi_bdev = bdev, - .bi_sector = sector, - .bi_rw = bio->bi_rw, - }; - - if (WARN_ON(bio->bi_vcnt == 0)) - return 1; - - prev = >bi_io_vec[bio->bi_vcnt - 1]; - if (bio_sectors(bio) > max_sectors) - return 0; - - if (!q->merge_bvec_fn) - return 1; - - bvm.bi_size = bio->bi_iter.bi_size - prev->bv_len; - if (q->merge_bvec_fn(q, , prev) < prev->bv_len) - return 0; - return 1; -} - static void submit_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio, struct bio *bio, u64 physical, int dev_nr, int rw, int async) @@ -5894,38 +5866,6 @@ static void submit_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio, btrfsic_submit_bio(rw, bio); } -static int breakup_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio, - struct bio *first_bio, struct btrfs_device *dev, - int dev_nr, int rw, int async) -{ - struct bio_vec *bvec = first_bio->bi_io_vec; - struct bio *bio; - int nr_vecs = bio_get_nr_vecs(dev->bdev); - u64 physical = bbio->stripes[dev_nr].physical; - -again: - bio = btrfs_bio_alloc(dev->bdev, physical >> 9, nr_vecs, GFP_NOFS); - if (!bio) - return -ENOMEM; - - while (bvec <= (first_bio->bi_io_vec + first_bio->bi_vcnt - 1)) { - if (bio_add_page(bio, bvec->bv_page, bvec->bv_len, -bvec->bv_offset) < bvec->bv_len) { - u64 len = bio->bi_iter.bi_size; - - atomic_inc(>stripes_pending); - submit_stripe_bio(root, bbio, bio, physical, dev_nr, - rw, async); - physical += len; - goto again; - } - bvec++; - } - - submit_stripe_bio(root, bbio, bio, physical, dev_nr, rw, async); - return 0; -} - static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) { atomic_inc(>error); @@ -5998,18 +5938,6 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, struct bio *bio, continue; } - /* -* Check and see if we're ok with this bio based on it's size -* and offset with the given device. -*/ - if (!bio_size_ok(dev->bdev, first_bio, -bbio->stripes[dev_nr].physical >> 9)) { - ret = breakup_stripe_bio(root, bbio, first_bio, dev, -dev_nr, rw, async_submit); - BUG_ON(ret); - continue; - } - if (dev_nr < total_devs - 1) { bio = btrfs_bio_clone(first_bio, GFP_NOFS); BUG_ON(!bio); /* -ENOMEM */ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 09/11] fs: use helper bio_add_page() instead of open coding on bi_io_vec
From: Kent Overstreet Call pre-defined helper bio_add_page() instead of open coding for iterating through bi_io_vec[]. Doing that, it's possible to make some parts in filesystems and mm/page_io.c simpler than before. Acked-by: Dave Kleikamp Cc: Christoph Hellwig Cc: Al Viro Cc: linux-fsde...@vger.kernel.org Signed-off-by: Kent Overstreet [dpark: add more description in commit message] Signed-off-by: Dongsu Park Signed-off-by: Ming Lin --- fs/buffer.c | 7 ++- fs/jfs/jfs_logmgr.c | 14 -- mm/page_io.c| 8 +++- 3 files changed, 9 insertions(+), 20 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 1cf7a53..95996ba 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -3046,12 +3046,9 @@ static int submit_bh_wbc(int rw, struct buffer_head *bh, bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9); bio->bi_bdev = bh->b_bdev; - bio->bi_io_vec[0].bv_page = bh->b_page; - bio->bi_io_vec[0].bv_len = bh->b_size; - bio->bi_io_vec[0].bv_offset = bh_offset(bh); - bio->bi_vcnt = 1; - bio->bi_iter.bi_size = bh->b_size; + bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh)); + BUG_ON(bio->bi_iter.bi_size != bh->b_size); bio->bi_end_io = end_bio_bh_io_sync; bio->bi_private = bh; diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c index bc462dc..46fae06 100644 --- a/fs/jfs/jfs_logmgr.c +++ b/fs/jfs/jfs_logmgr.c @@ -1999,12 +1999,9 @@ static int lbmRead(struct jfs_log * log, int pn, struct lbuf ** bpp) bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9); bio->bi_bdev = log->bdev; - bio->bi_io_vec[0].bv_page = bp->l_page; - bio->bi_io_vec[0].bv_len = LOGPSIZE; - bio->bi_io_vec[0].bv_offset = bp->l_offset; - bio->bi_vcnt = 1; - bio->bi_iter.bi_size = LOGPSIZE; + bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset); + BUG_ON(bio->bi_iter.bi_size != LOGPSIZE); bio->bi_end_io = lbmIODone; bio->bi_private = bp; @@ -2145,12 +2142,9 @@ static void lbmStartIO(struct lbuf * bp) bio = bio_alloc(GFP_NOFS, 1); bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9); bio->bi_bdev = log->bdev; - bio->bi_io_vec[0].bv_page = bp->l_page; - bio->bi_io_vec[0].bv_len = LOGPSIZE; - bio->bi_io_vec[0].bv_offset = bp->l_offset; - bio->bi_vcnt = 1; - bio->bi_iter.bi_size = LOGPSIZE; + bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset); + BUG_ON(bio->bi_iter.bi_size != LOGPSIZE); bio->bi_end_io = lbmIODone; bio->bi_private = bp; diff --git a/mm/page_io.c b/mm/page_io.c index 520baa4..194081b 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -33,12 +33,10 @@ static struct bio *get_swap_bio(gfp_t gfp_flags, if (bio) { bio->bi_iter.bi_sector = map_swap_page(page, >bi_bdev); bio->bi_iter.bi_sector <<= PAGE_SHIFT - 9; - bio->bi_io_vec[0].bv_page = page; - bio->bi_io_vec[0].bv_len = PAGE_SIZE; - bio->bi_io_vec[0].bv_offset = 0; - bio->bi_vcnt = 1; - bio->bi_iter.bi_size = PAGE_SIZE; bio->bi_end_io = end_io; + + bio_add_page(bio, page, PAGE_SIZE, 0); + BUG_ON(bio->bi_iter.bi_size != PAGE_SIZE); } return bio; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
From: Kent Overstreet The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe Cc: Christoph Hellwig Cc: Al Viro Cc: Ming Lei Cc: Neil Brown Cc: Alasdair Kergon Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Lars Ellenberg Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina Cc: Geoff Levand Cc: Jim Paris Cc: Joshua Morris Cc: Philip Kelleher Cc: Minchan Kim Cc: Nitin Gupta Cc: Oleg Drokin Cc: Andreas Dilger Acked-by: NeilBrown (for the 'md/md.c' bits) Signed-off-by: Kent Overstreet [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park Signed-off-by: Ming Lin --- block/blk-core.c| 19 ++-- block/blk-merge.c | 159 ++-- block/blk-mq.c | 4 + block/blk-sysfs.c | 3 + drivers/block/drbd/drbd_req.c | 2 + drivers/block/pktcdvd.c | 6 +- drivers/block/ps3vram.c | 2 + drivers/block/rsxx/dev.c| 2 + drivers/block/umem.c| 2 + drivers/block/zram/zram_drv.c | 2 + drivers/md/dm.c | 2 + drivers/md/md.c | 2 + drivers/s390/block/dcssblk.c| 2 + drivers/s390/block/xpram.c | 2 + drivers/staging/lustre/lustre/llite/lloop.c | 2 + include/linux/blkdev.h | 3 + 16 files changed, 192 insertions(+), 22 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 82819e6..cecf80c 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -645,6 +645,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) if (q->id < 0) goto fail_q; + q->bio_split = bioset_create(BIO_POOL_SIZE, 0); + if (!q->bio_split) + goto fail_id; + q->backing_dev_info.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; q->backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK; @@ -653,7 +657,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) err = bdi_init(>backing_dev_info); if (err) - goto fail_id; + goto fail_split; setup_timer(>backing_dev_info.laptop_mode_wb_timer, laptop_mode_timer_fn, (unsigned long) q); @@ -695,6 +699,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) fail_bdi: bdi_destroy(>backing_dev_info); +fail_split: + bioset_free(q->bio_split); fail_id: ida_simple_remove(_queue_ida, q->id); fail_q: @@ -1612,6 +1618,8 @@ static void blk_queue_bio(struct request_queue *q, struct bio *bio) struct request *req; unsigned int request_count = 0; + blk_queue_split(q, , q->bio_split); + /* * low level driver can indicate that it wants pages above a * certain limit bounced to low memory (ie for highmem, or even @@ -1832,15 +1840,6 @@ generic_make_request_checks(struct bio *bio) goto end_io; } - if (likely(bio_is_rw(bio) && - nr_sectors >
[PATCH v5 06/11] md/raid5: split bio for chunk_aligned_read
From: Ming Lin If a read request fits entirely in a chunk, it will be passed directly to the underlying device (providing it hasn't failed of course). If it doesn't fit, the slightly less efficient path that uses the stripe_cache is used. Requests that get to the stripe cache are always completely split up as necessary. So with RAID5, ripping out the merge_bvec_fn doesn't cause it to stop work, but could cause it to take the less efficient path more often. All that is needed to manage this is for 'chunk_aligned_read' do some bio splitting, much like the RAID0 code does. Cc: Neil Brown Cc: linux-r...@vger.kernel.org Acked-by: NeilBrown Signed-off-by: Ming Lin --- drivers/md/raid5.c | 37 - 1 file changed, 32 insertions(+), 5 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 59e44e9..8377e72 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4799,7 +4799,7 @@ static int bio_fits_rdev(struct bio *bi) return 1; } -static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio) +static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio) { struct r5conf *conf = mddev->private; int dd_idx; @@ -4808,7 +4808,7 @@ static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio) sector_t end_sector; if (!in_chunk_boundary(mddev, raid_bio)) { - pr_debug("chunk_aligned_read : non aligned\n"); + pr_debug("%s: non aligned\n", __func__); return 0; } /* @@ -4885,6 +4885,31 @@ static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio) } } +static struct bio *chunk_aligned_read(struct mddev *mddev, struct bio *raid_bio) +{ + struct bio *split; + + do { + sector_t sector = raid_bio->bi_iter.bi_sector; + unsigned chunk_sects = mddev->chunk_sectors; + unsigned sectors = chunk_sects - (sector & (chunk_sects-1)); + + if (sectors < bio_sectors(raid_bio)) { + split = bio_split(raid_bio, sectors, GFP_NOIO, fs_bio_set); + bio_chain(split, raid_bio); + } else + split = raid_bio; + + if (!raid5_read_one_chunk(mddev, split)) { + if (split != raid_bio) + generic_make_request(raid_bio); + return split; + } + } while (split != raid_bio); + + return NULL; +} + /* __get_priority_stripe - get the next stripe to process * * Full stripe writes are allowed to pass preread active stripes up until @@ -5162,9 +5187,11 @@ static void make_request(struct mddev *mddev, struct bio * bi) * data on failed drives. */ if (rw == READ && mddev->degraded == 0 && -mddev->reshape_position == MaxSector && -chunk_aligned_read(mddev,bi)) - return; + mddev->reshape_position == MaxSector) { + bi = chunk_aligned_read(mddev, bi); + if (!bi) + return; + } if (unlikely(bi->bi_rw & REQ_DISCARD)) { make_discard_request(mddev, bi); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 00/11] simplify block layer based on immutable biovecs
Hi Mike, On Wed, 2015-06-10 at 17:46 -0400, Mike Snitzer wrote: > I've been busy getting DM changes for the 4.2 merge window finalized. > As such I haven't connected with others on the team to discuss this > issue. > > I'll see if we can make time in the next 2 days. But I also have > RHEL-specific kernel deadlines I'm coming up against. > > Seems late to be staging this extensive a change for 4.2... are you > pushing for this code to land in the 4.2 merge window? Or do we have > time to work this further and target the 4.3 merge? > 4.2-rc1 was out. Would you have time to work together for 4.3 merge? Fio test results(4.1-rc4/rc7) showed no performance regressions for HW/SW RAID6 and DM stripe tests. http://minggr.net/pub/20150608/fio_results/summary.log v5: - rebase on top of 4.2-rc1 - reorder patch 6,7 - add NeilBrown's ACKs - fix memory leak: free "bio_split" bioset in blk_release_queue() v4: - rebase on top of 4.1-rc4 - use BIO_POOL_SIZE instead of number 4 for bioset_create() - call blk_queue_split() in blk_mq_make_request() - call blk_queue_split() in zram_make_request() - add patch "block: remove bio_get_nr_vecs()" - remove split code in blkdev_issue_discard() - drop patch "md/raid10: make sync_request_write() call bio_copy_data()". NeilBrown queued it. - drop patch "block: allow __blk_queue_bounce() to handle bios larger than BIO_MAX_PAGES". Will send it seperately v3: - rebase on top of 4.1-rc2 - support for QUEUE_FLAG_SG_GAPS - update commit logs of patch 2&4 - split bio for chunk_aligned_read v2: https://lkml.org/lkml/2015/4/28/28 v1: https://lkml.org/lkml/2014/12/22/128 This is the 5th attempt of simplifying block layer based on immutable biovecs. Immutable biovecs, implemented by Kent Overstreet, have been available in mainline since v3.14. Its original goal was actually making generic_make_request() accept arbitrarily sized bios, and pushing the splitting down to the drivers or wherever it's required. See also discussions in the past, [1] [2] [3]. This will bring not only performance improvements, but also a great amount of reduction in code complexity all over the block layer. Performance gain is possible due to the fact that bio_add_page() does not have to check unnecesary conditions such as queue limits or if biovecs are mergeable. Those will be delegated to the driver level. Kent already said that he actually benchmarked the impact of this with fio on a micron p320h, which showed definitely a positive impact. Moreover, this patchset also allows a lot of code to be deleted, mainly because of removal of merge_bvec_fn() callbacks. We have been aware that it has been always a delicate issue for stacking block drivers (e.g. md and bcache) to handle merging bio consistently. This simplication will help every individual block driver avoid having such an issue. Patches are against 4.2-rc1. These are also available in my git repo at: https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req git://git.kernel.org/pub/scm/linux/kernel/git/mlin/linux.git block-generic-req This patchset is a prerequisite of other consecutive patchsets, e.g. multipage biovecs, rewriting plugging, or rewriting direct-IO, which are excluded this time. That means, this patchset should not bring any regression to end-users. Comments are welcome. Ming [1] https://lkml.org/lkml/2014/11/23/263 [2] https://lkml.org/lkml/2013/11/25/732 [3] https://lkml.org/lkml/2014/2/26/618 Dongsu Park (1): Documentation: update notes in biovecs about arbitrarily sized bios Kent Overstreet (8): block: make generic_make_request handle arbitrarily sized bios block: simplify bio_add_page() bcache: remove driver private bio splitting code btrfs: remove bio splitting and merge_bvec_fn() calls md/raid5: get rid of bio_fits_rdev() block: kill merge_bvec_fn() completely fs: use helper bio_add_page() instead of open coding on bi_io_vec block: remove bio_get_nr_vecs() Ming Lin (2): block: remove split code in blkdev_issue_discard md/raid5: split bio for chunk_aligned_read Documentation/block/biovecs.txt | 10 +- block/bio.c | 152 ++-- block/blk-core.c| 19 ++-- block/blk-lib.c | 73 +++-- block/blk-merge.c | 148 +-- block/blk-mq.c | 4 + block/blk-settings.c| 22 block/blk-sysfs.c | 3 + drivers/block/drbd/drbd_int.h | 1 - drivers/block/drbd/drbd_main.c | 1 - drivers/block/drbd/drbd_req.c | 37 +-- drivers/block/pktcdvd.c | 27 + drivers/block/ps3vram.c
[PATCH v5 06/11] md/raid5: split bio for chunk_aligned_read
From: Ming Lin min...@ssi.samsung.com If a read request fits entirely in a chunk, it will be passed directly to the underlying device (providing it hasn't failed of course). If it doesn't fit, the slightly less efficient path that uses the stripe_cache is used. Requests that get to the stripe cache are always completely split up as necessary. So with RAID5, ripping out the merge_bvec_fn doesn't cause it to stop work, but could cause it to take the less efficient path more often. All that is needed to manage this is for 'chunk_aligned_read' do some bio splitting, much like the RAID0 code does. Cc: Neil Brown ne...@suse.de Cc: linux-r...@vger.kernel.org Acked-by: NeilBrown ne...@suse.de Signed-off-by: Ming Lin min...@ssi.samsung.com --- drivers/md/raid5.c | 37 - 1 file changed, 32 insertions(+), 5 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 59e44e9..8377e72 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4799,7 +4799,7 @@ static int bio_fits_rdev(struct bio *bi) return 1; } -static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio) +static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio) { struct r5conf *conf = mddev-private; int dd_idx; @@ -4808,7 +4808,7 @@ static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio) sector_t end_sector; if (!in_chunk_boundary(mddev, raid_bio)) { - pr_debug(chunk_aligned_read : non aligned\n); + pr_debug(%s: non aligned\n, __func__); return 0; } /* @@ -4885,6 +4885,31 @@ static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio) } } +static struct bio *chunk_aligned_read(struct mddev *mddev, struct bio *raid_bio) +{ + struct bio *split; + + do { + sector_t sector = raid_bio-bi_iter.bi_sector; + unsigned chunk_sects = mddev-chunk_sectors; + unsigned sectors = chunk_sects - (sector (chunk_sects-1)); + + if (sectors bio_sectors(raid_bio)) { + split = bio_split(raid_bio, sectors, GFP_NOIO, fs_bio_set); + bio_chain(split, raid_bio); + } else + split = raid_bio; + + if (!raid5_read_one_chunk(mddev, split)) { + if (split != raid_bio) + generic_make_request(raid_bio); + return split; + } + } while (split != raid_bio); + + return NULL; +} + /* __get_priority_stripe - get the next stripe to process * * Full stripe writes are allowed to pass preread active stripes up until @@ -5162,9 +5187,11 @@ static void make_request(struct mddev *mddev, struct bio * bi) * data on failed drives. */ if (rw == READ mddev-degraded == 0 -mddev-reshape_position == MaxSector -chunk_aligned_read(mddev,bi)) - return; + mddev-reshape_position == MaxSector) { + bi = chunk_aligned_read(mddev, bi); + if (!bi) + return; + } if (unlikely(bi-bi_rw REQ_DISCARD)) { make_discard_request(mddev, bi); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 09/11] fs: use helper bio_add_page() instead of open coding on bi_io_vec
From: Kent Overstreet kent.overstr...@gmail.com Call pre-defined helper bio_add_page() instead of open coding for iterating through bi_io_vec[]. Doing that, it's possible to make some parts in filesystems and mm/page_io.c simpler than before. Acked-by: Dave Kleikamp sha...@kernel.org Cc: Christoph Hellwig h...@infradead.org Cc: Al Viro v...@zeniv.linux.org.uk Cc: linux-fsde...@vger.kernel.org Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: add more description in commit message] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com --- fs/buffer.c | 7 ++- fs/jfs/jfs_logmgr.c | 14 -- mm/page_io.c| 8 +++- 3 files changed, 9 insertions(+), 20 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 1cf7a53..95996ba 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -3046,12 +3046,9 @@ static int submit_bh_wbc(int rw, struct buffer_head *bh, bio-bi_iter.bi_sector = bh-b_blocknr * (bh-b_size 9); bio-bi_bdev = bh-b_bdev; - bio-bi_io_vec[0].bv_page = bh-b_page; - bio-bi_io_vec[0].bv_len = bh-b_size; - bio-bi_io_vec[0].bv_offset = bh_offset(bh); - bio-bi_vcnt = 1; - bio-bi_iter.bi_size = bh-b_size; + bio_add_page(bio, bh-b_page, bh-b_size, bh_offset(bh)); + BUG_ON(bio-bi_iter.bi_size != bh-b_size); bio-bi_end_io = end_bio_bh_io_sync; bio-bi_private = bh; diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c index bc462dc..46fae06 100644 --- a/fs/jfs/jfs_logmgr.c +++ b/fs/jfs/jfs_logmgr.c @@ -1999,12 +1999,9 @@ static int lbmRead(struct jfs_log * log, int pn, struct lbuf ** bpp) bio-bi_iter.bi_sector = bp-l_blkno (log-l2bsize - 9); bio-bi_bdev = log-bdev; - bio-bi_io_vec[0].bv_page = bp-l_page; - bio-bi_io_vec[0].bv_len = LOGPSIZE; - bio-bi_io_vec[0].bv_offset = bp-l_offset; - bio-bi_vcnt = 1; - bio-bi_iter.bi_size = LOGPSIZE; + bio_add_page(bio, bp-l_page, LOGPSIZE, bp-l_offset); + BUG_ON(bio-bi_iter.bi_size != LOGPSIZE); bio-bi_end_io = lbmIODone; bio-bi_private = bp; @@ -2145,12 +2142,9 @@ static void lbmStartIO(struct lbuf * bp) bio = bio_alloc(GFP_NOFS, 1); bio-bi_iter.bi_sector = bp-l_blkno (log-l2bsize - 9); bio-bi_bdev = log-bdev; - bio-bi_io_vec[0].bv_page = bp-l_page; - bio-bi_io_vec[0].bv_len = LOGPSIZE; - bio-bi_io_vec[0].bv_offset = bp-l_offset; - bio-bi_vcnt = 1; - bio-bi_iter.bi_size = LOGPSIZE; + bio_add_page(bio, bp-l_page, LOGPSIZE, bp-l_offset); + BUG_ON(bio-bi_iter.bi_size != LOGPSIZE); bio-bi_end_io = lbmIODone; bio-bi_private = bp; diff --git a/mm/page_io.c b/mm/page_io.c index 520baa4..194081b 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -33,12 +33,10 @@ static struct bio *get_swap_bio(gfp_t gfp_flags, if (bio) { bio-bi_iter.bi_sector = map_swap_page(page, bio-bi_bdev); bio-bi_iter.bi_sector = PAGE_SHIFT - 9; - bio-bi_io_vec[0].bv_page = page; - bio-bi_io_vec[0].bv_len = PAGE_SIZE; - bio-bi_io_vec[0].bv_offset = 0; - bio-bi_vcnt = 1; - bio-bi_iter.bi_size = PAGE_SIZE; bio-bi_end_io = end_io; + + bio_add_page(bio, page, PAGE_SIZE, 0); + BUG_ON(bio-bi_iter.bi_size != PAGE_SIZE); } return bio; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
From: Kent Overstreet kent.overstr...@gmail.com The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe ax...@kernel.dk Cc: Christoph Hellwig h...@infradead.org Cc: Al Viro v...@zeniv.linux.org.uk Cc: Ming Lei ming@canonical.com Cc: Neil Brown ne...@suse.de Cc: Alasdair Kergon a...@redhat.com Cc: Mike Snitzer snit...@redhat.com Cc: dm-de...@redhat.com Cc: Lars Ellenberg drbd-...@lists.linbit.com Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina jkos...@suse.cz Cc: Geoff Levand ge...@infradead.org Cc: Jim Paris j...@jtan.com Cc: Joshua Morris josh.h.mor...@us.ibm.com Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com Cc: Minchan Kim minc...@kernel.org Cc: Nitin Gupta ngu...@vflare.org Cc: Oleg Drokin oleg.dro...@intel.com Cc: Andreas Dilger andreas.dil...@intel.com Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits) Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com --- block/blk-core.c| 19 ++-- block/blk-merge.c | 159 ++-- block/blk-mq.c | 4 + block/blk-sysfs.c | 3 + drivers/block/drbd/drbd_req.c | 2 + drivers/block/pktcdvd.c | 6 +- drivers/block/ps3vram.c | 2 + drivers/block/rsxx/dev.c| 2 + drivers/block/umem.c| 2 + drivers/block/zram/zram_drv.c | 2 + drivers/md/dm.c | 2 + drivers/md/md.c | 2 + drivers/s390/block/dcssblk.c| 2 + drivers/s390/block/xpram.c | 2 + drivers/staging/lustre/lustre/llite/lloop.c | 2 + include/linux/blkdev.h | 3 + 16 files changed, 192 insertions(+), 22 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 82819e6..cecf80c 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -645,6 +645,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) if (q-id 0) goto fail_q; + q-bio_split = bioset_create(BIO_POOL_SIZE, 0); + if (!q-bio_split) + goto fail_id; + q-backing_dev_info.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; q-backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK; @@ -653,7 +657,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) err = bdi_init(q-backing_dev_info); if (err) - goto fail_id; + goto fail_split; setup_timer(q-backing_dev_info.laptop_mode_wb_timer, laptop_mode_timer_fn, (unsigned long) q); @@ -695,6 +699,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) fail_bdi: bdi_destroy(q-backing_dev_info); +fail_split: + bioset_free(q-bio_split); fail_id: ida_simple_remove(blk_queue_ida, q-id); fail_q: @@ -1612,6 +1618,8 @@ static void blk_queue_bio(struct request_queue *q, struct bio *bio) struct request
[PATCH v5 04/11] btrfs: remove bio splitting and merge_bvec_fn() calls
From: Kent Overstreet kent.overstr...@gmail.com Btrfs has been doing bio splitting from btrfs_map_bio(), by checking device limits as well as calling -merge_bvec_fn() etc. That is not necessary any more, because generic_make_request() is now able to handle arbitrarily sized bios. So clean up unnecessary code paths. Cc: Chris Mason c...@fb.com Cc: Josef Bacik jba...@fb.com Cc: linux-bt...@vger.kernel.org Signed-off-by: Kent Overstreet kent.overstr...@gmail.com Signed-off-by: Chris Mason c...@fb.com [dpark: add more description in commit message] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com --- fs/btrfs/volumes.c | 72 -- 1 file changed, 72 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4b438b4..fd25b81 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5833,34 +5833,6 @@ static noinline void btrfs_schedule_bio(struct btrfs_root *root, device-work); } -static int bio_size_ok(struct block_device *bdev, struct bio *bio, - sector_t sector) -{ - struct bio_vec *prev; - struct request_queue *q = bdev_get_queue(bdev); - unsigned int max_sectors = queue_max_sectors(q); - struct bvec_merge_data bvm = { - .bi_bdev = bdev, - .bi_sector = sector, - .bi_rw = bio-bi_rw, - }; - - if (WARN_ON(bio-bi_vcnt == 0)) - return 1; - - prev = bio-bi_io_vec[bio-bi_vcnt - 1]; - if (bio_sectors(bio) max_sectors) - return 0; - - if (!q-merge_bvec_fn) - return 1; - - bvm.bi_size = bio-bi_iter.bi_size - prev-bv_len; - if (q-merge_bvec_fn(q, bvm, prev) prev-bv_len) - return 0; - return 1; -} - static void submit_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio, struct bio *bio, u64 physical, int dev_nr, int rw, int async) @@ -5894,38 +5866,6 @@ static void submit_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio, btrfsic_submit_bio(rw, bio); } -static int breakup_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio, - struct bio *first_bio, struct btrfs_device *dev, - int dev_nr, int rw, int async) -{ - struct bio_vec *bvec = first_bio-bi_io_vec; - struct bio *bio; - int nr_vecs = bio_get_nr_vecs(dev-bdev); - u64 physical = bbio-stripes[dev_nr].physical; - -again: - bio = btrfs_bio_alloc(dev-bdev, physical 9, nr_vecs, GFP_NOFS); - if (!bio) - return -ENOMEM; - - while (bvec = (first_bio-bi_io_vec + first_bio-bi_vcnt - 1)) { - if (bio_add_page(bio, bvec-bv_page, bvec-bv_len, -bvec-bv_offset) bvec-bv_len) { - u64 len = bio-bi_iter.bi_size; - - atomic_inc(bbio-stripes_pending); - submit_stripe_bio(root, bbio, bio, physical, dev_nr, - rw, async); - physical += len; - goto again; - } - bvec++; - } - - submit_stripe_bio(root, bbio, bio, physical, dev_nr, rw, async); - return 0; -} - static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) { atomic_inc(bbio-error); @@ -5998,18 +5938,6 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, struct bio *bio, continue; } - /* -* Check and see if we're ok with this bio based on it's size -* and offset with the given device. -*/ - if (!bio_size_ok(dev-bdev, first_bio, -bbio-stripes[dev_nr].physical 9)) { - ret = breakup_stripe_bio(root, bbio, first_bio, dev, -dev_nr, rw, async_submit); - BUG_ON(ret); - continue; - } - if (dev_nr total_devs - 1) { bio = btrfs_bio_clone(first_bio, GFP_NOFS); BUG_ON(!bio); /* -ENOMEM */ -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 07/11] md/raid5: get rid of bio_fits_rdev()
From: Kent Overstreet kent.overstr...@gmail.com Remove bio_fits_rdev() as sufficient merge_bvec_fn() handling is now performed by blk_queue_split() in md_make_request(). Cc: Neil Brown ne...@suse.de Cc: linux-r...@vger.kernel.org Acked-by: NeilBrown ne...@suse.de Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: add more description in commit message] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com --- drivers/md/raid5.c | 23 +-- 1 file changed, 1 insertion(+), 22 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 8377e72..8bdf81a 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4780,25 +4780,6 @@ static void raid5_align_endio(struct bio *bi, int error) add_bio_to_retry(raid_bi, conf); } -static int bio_fits_rdev(struct bio *bi) -{ - struct request_queue *q = bdev_get_queue(bi-bi_bdev); - - if (bio_sectors(bi) queue_max_sectors(q)) - return 0; - blk_recount_segments(q, bi); - if (bi-bi_phys_segments queue_max_segments(q)) - return 0; - - if (q-merge_bvec_fn) - /* it's too hard to apply the merge_bvec_fn at this stage, -* just just give up -*/ - return 0; - - return 1; -} - static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio) { struct r5conf *conf = mddev-private; @@ -4852,11 +4833,9 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio) align_bi-bi_bdev = rdev-bdev; __clear_bit(BIO_SEG_VALID, align_bi-bi_flags); - if (!bio_fits_rdev(align_bi) || - is_badblock(rdev, align_bi-bi_iter.bi_sector, + if (is_badblock(rdev, align_bi-bi_iter.bi_sector, bio_sectors(align_bi), first_bad, bad_sectors)) { - /* too big in some way, or has a known bad block */ bio_put(align_bi); rdev_dec_pending(rdev, mddev); return 0; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 00/11] simplify block layer based on immutable biovecs
Hi Mike, On Wed, 2015-06-10 at 17:46 -0400, Mike Snitzer wrote: I've been busy getting DM changes for the 4.2 merge window finalized. As such I haven't connected with others on the team to discuss this issue. I'll see if we can make time in the next 2 days. But I also have RHEL-specific kernel deadlines I'm coming up against. Seems late to be staging this extensive a change for 4.2... are you pushing for this code to land in the 4.2 merge window? Or do we have time to work this further and target the 4.3 merge? 4.2-rc1 was out. Would you have time to work together for 4.3 merge? Fio test results(4.1-rc4/rc7) showed no performance regressions for HW/SW RAID6 and DM stripe tests. http://minggr.net/pub/20150608/fio_results/summary.log v5: - rebase on top of 4.2-rc1 - reorder patch 6,7 - add NeilBrown's ACKs - fix memory leak: free bio_split bioset in blk_release_queue() v4: - rebase on top of 4.1-rc4 - use BIO_POOL_SIZE instead of number 4 for bioset_create() - call blk_queue_split() in blk_mq_make_request() - call blk_queue_split() in zram_make_request() - add patch block: remove bio_get_nr_vecs() - remove split code in blkdev_issue_discard() - drop patch md/raid10: make sync_request_write() call bio_copy_data(). NeilBrown queued it. - drop patch block: allow __blk_queue_bounce() to handle bios larger than BIO_MAX_PAGES. Will send it seperately v3: - rebase on top of 4.1-rc2 - support for QUEUE_FLAG_SG_GAPS - update commit logs of patch 24 - split bio for chunk_aligned_read v2: https://lkml.org/lkml/2015/4/28/28 v1: https://lkml.org/lkml/2014/12/22/128 This is the 5th attempt of simplifying block layer based on immutable biovecs. Immutable biovecs, implemented by Kent Overstreet, have been available in mainline since v3.14. Its original goal was actually making generic_make_request() accept arbitrarily sized bios, and pushing the splitting down to the drivers or wherever it's required. See also discussions in the past, [1] [2] [3]. This will bring not only performance improvements, but also a great amount of reduction in code complexity all over the block layer. Performance gain is possible due to the fact that bio_add_page() does not have to check unnecesary conditions such as queue limits or if biovecs are mergeable. Those will be delegated to the driver level. Kent already said that he actually benchmarked the impact of this with fio on a micron p320h, which showed definitely a positive impact. Moreover, this patchset also allows a lot of code to be deleted, mainly because of removal of merge_bvec_fn() callbacks. We have been aware that it has been always a delicate issue for stacking block drivers (e.g. md and bcache) to handle merging bio consistently. This simplication will help every individual block driver avoid having such an issue. Patches are against 4.2-rc1. These are also available in my git repo at: https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req git://git.kernel.org/pub/scm/linux/kernel/git/mlin/linux.git block-generic-req This patchset is a prerequisite of other consecutive patchsets, e.g. multipage biovecs, rewriting plugging, or rewriting direct-IO, which are excluded this time. That means, this patchset should not bring any regression to end-users. Comments are welcome. Ming [1] https://lkml.org/lkml/2014/11/23/263 [2] https://lkml.org/lkml/2013/11/25/732 [3] https://lkml.org/lkml/2014/2/26/618 Dongsu Park (1): Documentation: update notes in biovecs about arbitrarily sized bios Kent Overstreet (8): block: make generic_make_request handle arbitrarily sized bios block: simplify bio_add_page() bcache: remove driver private bio splitting code btrfs: remove bio splitting and merge_bvec_fn() calls md/raid5: get rid of bio_fits_rdev() block: kill merge_bvec_fn() completely fs: use helper bio_add_page() instead of open coding on bi_io_vec block: remove bio_get_nr_vecs() Ming Lin (2): block: remove split code in blkdev_issue_discard md/raid5: split bio for chunk_aligned_read Documentation/block/biovecs.txt | 10 +- block/bio.c | 152 ++-- block/blk-core.c| 19 ++-- block/blk-lib.c | 73 +++-- block/blk-merge.c | 148 +-- block/blk-mq.c | 4 + block/blk-settings.c| 22 block/blk-sysfs.c | 3 + drivers/block/drbd/drbd_int.h | 1 - drivers/block/drbd/drbd_main.c | 1 - drivers/block/drbd/drbd_req.c | 37 +-- drivers/block/pktcdvd.c | 27 + drivers/block/ps3vram.c | 2 + drivers/block/rbd.c | 47 - drivers/block/rsxx/dev.c
[PATCH v5 02/11] block: simplify bio_add_page()
From: Kent Overstreet kent.overstr...@gmail.com Since generic_make_request() can now handle arbitrary size bios, all we have to do is make sure the bvec array doesn't overflow. __bio_add_page() doesn't need to call -merge_bvec_fn(), where we can get rid of unnecessary code paths. Removing the call to -merge_bvec_fn() is also fine, as no driver that implements support for BLOCK_PC commands even has a -merge_bvec_fn() method. Cc: Christoph Hellwig h...@infradead.org Cc: Jens Axboe ax...@kernel.dk Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: rebase and resolve merge conflicts, change a couple of comments, make bio_add_page() warn once upon a cloned bio.] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com --- block/bio.c | 135 +--- 1 file changed, 55 insertions(+), 80 deletions(-) diff --git a/block/bio.c b/block/bio.c index 2a00d34..da15e9a 100644 --- a/block/bio.c +++ b/block/bio.c @@ -714,9 +714,23 @@ int bio_get_nr_vecs(struct block_device *bdev) } EXPORT_SYMBOL(bio_get_nr_vecs); -static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page - *page, unsigned int len, unsigned int offset, - unsigned int max_sectors) +/** + * bio_add_pc_page - attempt to add page to bio + * @q: the target queue + * @bio: destination bio + * @page: page to add + * @len: vec entry length + * @offset: vec entry offset + * + * Attempt to add a page to the bio_vec maplist. This can fail for a + * number of reasons, such as the bio being full or target block device + * limitations. The target block device must allow bio's up to PAGE_SIZE, + * so it is always possible to add a single page to an empty bio. + * + * This should only be used by REQ_PC bios. + */ +int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page + *page, unsigned int len, unsigned int offset) { int retried_segments = 0; struct bio_vec *bvec; @@ -727,7 +741,7 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page if (unlikely(bio_flagged(bio, BIO_CLONED))) return 0; - if (((bio-bi_iter.bi_size + len) 9) max_sectors) + if (((bio-bi_iter.bi_size + len) 9) queue_max_hw_sectors(q)) return 0; /* @@ -740,28 +754,7 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page if (page == prev-bv_page offset == prev-bv_offset + prev-bv_len) { - unsigned int prev_bv_len = prev-bv_len; prev-bv_len += len; - - if (q-merge_bvec_fn) { - struct bvec_merge_data bvm = { - /* prev_bvec is already charged in - bi_size, discharge it in order to - simulate merging updated prev_bvec - as new bvec. */ - .bi_bdev = bio-bi_bdev, - .bi_sector = bio-bi_iter.bi_sector, - .bi_size = bio-bi_iter.bi_size - - prev_bv_len, - .bi_rw = bio-bi_rw, - }; - - if (q-merge_bvec_fn(q, bvm, prev) prev-bv_len) { - prev-bv_len -= len; - return 0; - } - } - bio-bi_iter.bi_size += len; goto done; } @@ -804,27 +797,6 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page blk_recount_segments(q, bio); } - /* -* if queue has other restrictions (eg varying max sector size -* depending on offset), it can specify a merge_bvec_fn in the -* queue to get further control -*/ - if (q-merge_bvec_fn) { - struct bvec_merge_data bvm = { - .bi_bdev = bio-bi_bdev, - .bi_sector = bio-bi_iter.bi_sector, - .bi_size = bio-bi_iter.bi_size - len, - .bi_rw = bio-bi_rw, - }; - - /* -* merge_bvec_fn() returns number of bytes it can accept -* at this offset -*/ - if (q-merge_bvec_fn(q, bvm, bvec) bvec-bv_len) - goto failed; - } - /* If we may be able to merge these biovecs, force a recount */ if (bio-bi_vcnt 1 (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))
[PATCH v5 08/11] block: kill merge_bvec_fn() completely
From: Kent Overstreet kent.overstr...@gmail.com As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own -merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe ax...@kernel.dk Cc: Lars Ellenberg drbd-...@lists.linbit.com Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina jkos...@suse.cz Cc: Yehuda Sadeh yeh...@inktank.com Cc: Sage Weil s...@inktank.com Cc: Alex Elder el...@kernel.org Cc: ceph-de...@vger.kernel.org Cc: Alasdair Kergon a...@redhat.com Cc: Mike Snitzer snit...@redhat.com Cc: dm-de...@redhat.com Cc: Neil Brown ne...@suse.de Cc: linux-r...@vger.kernel.org Cc: Christoph Hellwig h...@infradead.org Cc: Martin K. Petersen martin.peter...@oracle.com Acked-by: NeilBrown ne...@suse.de (for the 'md' bits) Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: also remove -merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com --- block/blk-merge.c | 17 +- block/blk-settings.c | 22 --- drivers/block/drbd/drbd_int.h | 1 - drivers/block/drbd/drbd_main.c | 1 - drivers/block/drbd/drbd_req.c | 35 drivers/block/pktcdvd.c| 21 --- drivers/block/rbd.c| 47 --- drivers/md/dm-cache-target.c | 21 --- drivers/md/dm-crypt.c | 16 -- drivers/md/dm-era-target.c | 15 - drivers/md/dm-flakey.c | 16 -- drivers/md/dm-linear.c | 16 -- drivers/md/dm-log-writes.c | 16 -- drivers/md/dm-raid.c | 19 -- drivers/md/dm-snap.c | 15 - drivers/md/dm-stripe.c | 21 --- drivers/md/dm-table.c | 8 --- drivers/md/dm-thin.c | 31 -- drivers/md/dm-verity.c | 16 -- drivers/md/dm.c| 127 + drivers/md/dm.h| 2 - drivers/md/linear.c| 43 -- drivers/md/md.c| 26 - drivers/md/md.h| 12 drivers/md/multipath.c | 21 --- drivers/md/raid0.c | 56 -- drivers/md/raid0.h | 2 - drivers/md/raid1.c | 58 +-- drivers/md/raid10.c| 121 +-- drivers/md/raid5.c | 32 --- include/linux/blkdev.h | 10 include/linux/device-mapper.h | 4 -- 32 files changed, 9 insertions(+), 859 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index 3707f30..1f5dfa0 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -69,24 +69,13 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, struct bio *split; struct bio_vec bv, bvprv; struct bvec_iter iter; - unsigned seg_size = 0, nsegs = 0; + unsigned seg_size = 0, nsegs = 0, sectors = 0; int prev = 0; - struct bvec_merge_data bvm = { - .bi_bdev= bio-bi_bdev, - .bi_sector = bio-bi_iter.bi_sector, - .bi_size= 0, - .bi_rw = bio-bi_rw, - }; - bio_for_each_segment(bv, bio, iter) { - if (q-merge_bvec_fn - q-merge_bvec_fn(q, bvm, bv) (int) bv.bv_len) - goto split; - - bvm.bi_size += bv.bv_len; + sectors += bv.bv_len 9; - if (bvm.bi_size 9 queue_max_sectors(q)) + if (sectors queue_max_sectors(q)) goto split; /* diff --git a/block/blk-settings.c b/block/blk-settings.c index 12600bf..e90d477 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -53,28 +53,6 @@ void blk_queue_unprep_rq(struct request_queue *q, unprep_rq_fn *ufn) } EXPORT_SYMBOL(blk_queue_unprep_rq); -/** - * blk_queue_merge_bvec - set a merge_bvec function for queue - * @q: queue - * @mbfn: merge_bvec_fn - * - * Usually queues have static limitations on the max sectors or segments that - * we can put in a request. Stacking drivers may have some settings that - * are dynamic, and thus we have to query the queue whether it is ok to - * add a new bio_vec to a bio at a given offset or not. If the block device - * has such limitations, it needs to register a merge_bvec_fn to control - * the size of bio's sent to it. Note that a block device *must* allow a - * single page to be added to an empty bio. The block device driver may want - * to use the bio_split() function to deal with these bio's. By default - * no merge_bvec_fn is defined for a queue, and only the fixed limits are - * honored. - */ -void blk_queue_merge_bvec(struct request_queue *q, merge_bvec_fn *mbfn) -{ - q-merge_bvec_fn = mbfn; -}
[PATCH v5 03/11] bcache: remove driver private bio splitting code
From: Kent Overstreet kent.overstr...@gmail.com The bcache driver has always accepted arbitrarily large bios and split them internally. Now that every driver must accept arbitrarily large bios this code isn't nessecary anymore. Cc: linux-bca...@vger.kernel.org Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: add more description in commit message] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com --- drivers/md/bcache/bcache.h| 18 drivers/md/bcache/io.c| 100 +- drivers/md/bcache/journal.c | 4 +- drivers/md/bcache/request.c | 16 +++ drivers/md/bcache/super.c | 32 +- drivers/md/bcache/util.h | 5 ++- drivers/md/bcache/writeback.c | 4 +- 7 files changed, 18 insertions(+), 161 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 04f7bc2..6b420a5 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -243,19 +243,6 @@ struct keybuf { DECLARE_ARRAY_ALLOCATOR(struct keybuf_key, freelist, KEYBUF_NR); }; -struct bio_split_pool { - struct bio_set *bio_split; - mempool_t *bio_split_hook; -}; - -struct bio_split_hook { - struct closure cl; - struct bio_split_pool *p; - struct bio *bio; - bio_end_io_t*bi_end_io; - void*bi_private; -}; - struct bcache_device { struct closure cl; @@ -288,8 +275,6 @@ struct bcache_device { int (*cache_miss)(struct btree *, struct search *, struct bio *, unsigned); int (*ioctl) (struct bcache_device *, fmode_t, unsigned, unsigned long); - - struct bio_split_pool bio_split_hook; }; struct io { @@ -454,8 +439,6 @@ struct cache { atomic_long_t meta_sectors_written; atomic_long_t btree_sectors_written; atomic_long_t sectors_written; - - struct bio_split_pool bio_split_hook; }; struct gc_stat { @@ -873,7 +856,6 @@ void bch_bbio_endio(struct cache_set *, struct bio *, int, const char *); void bch_bbio_free(struct bio *, struct cache_set *); struct bio *bch_bbio_alloc(struct cache_set *); -void bch_generic_make_request(struct bio *, struct bio_split_pool *); void __bch_submit_bbio(struct bio *, struct cache_set *); void bch_submit_bbio(struct bio *, struct cache_set *, struct bkey *, unsigned); diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c index cb64e64a..86a0bb8 100644 --- a/drivers/md/bcache/io.c +++ b/drivers/md/bcache/io.c @@ -11,104 +11,6 @@ #include linux/blkdev.h -static unsigned bch_bio_max_sectors(struct bio *bio) -{ - struct request_queue *q = bdev_get_queue(bio-bi_bdev); - struct bio_vec bv; - struct bvec_iter iter; - unsigned ret = 0, seg = 0; - - if (bio-bi_rw REQ_DISCARD) - return min(bio_sectors(bio), q-limits.max_discard_sectors); - - bio_for_each_segment(bv, bio, iter) { - struct bvec_merge_data bvm = { - .bi_bdev= bio-bi_bdev, - .bi_sector = bio-bi_iter.bi_sector, - .bi_size= ret 9, - .bi_rw = bio-bi_rw, - }; - - if (seg == min_t(unsigned, BIO_MAX_PAGES, -queue_max_segments(q))) - break; - - if (q-merge_bvec_fn - q-merge_bvec_fn(q, bvm, bv) (int) bv.bv_len) - break; - - seg++; - ret += bv.bv_len 9; - } - - ret = min(ret, queue_max_sectors(q)); - - WARN_ON(!ret); - ret = max_t(int, ret, bio_iovec(bio).bv_len 9); - - return ret; -} - -static void bch_bio_submit_split_done(struct closure *cl) -{ - struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl); - - s-bio-bi_end_io = s-bi_end_io; - s-bio-bi_private = s-bi_private; - bio_endio(s-bio, 0); - - closure_debug_destroy(s-cl); - mempool_free(s, s-p-bio_split_hook); -} - -static void bch_bio_submit_split_endio(struct bio *bio, int error) -{ - struct closure *cl = bio-bi_private; - struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl); - - if (error) - clear_bit(BIO_UPTODATE, s-bio-bi_flags); - - bio_put(bio); - closure_put(cl); -} - -void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p) -{ - struct bio_split_hook *s; - struct bio *n; - - if (!bio_has_data(bio) !(bio-bi_rw REQ_DISCARD)) - goto submit; - - if (bio_sectors(bio) = bch_bio_max_sectors(bio)) - goto submit; - - s = mempool_alloc(p-bio_split_hook, GFP_NOIO); - closure_init(s-cl, NULL); - -
[PATCH v5 05/11] block: remove split code in blkdev_issue_discard
From: Ming Lin min...@ssi.samsung.com The split code in blkdev_issue_discard() can go away now that any driver that cares does the split. Signed-off-by: Ming Lin min...@ssi.samsung.com --- block/blk-lib.c | 73 +++-- 1 file changed, 14 insertions(+), 59 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..3bf3c4a 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -43,34 +43,17 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; - unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; struct blk_plug plug; - if (!q) + if (!q || !nr_sects) return -ENXIO; if (!blk_queue_discard(q)) return -EOPNOTSUPP; - /* Zero-sector (unknown) and one-sector granularities are the same. */ - granularity = max(q-limits.discard_granularity 9, 1U); - alignment = (bdev_discard_alignment(bdev) 9) % granularity; - - /* -* Ensure that max_discard_sectors is of the proper -* granularity, so that requests stay aligned after a split. -*/ - max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 9); - max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } - if (flags BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) return -EOPNOTSUPP; @@ -82,52 +65,24 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, bb.wait = wait; blk_start_plug(plug); - while (nr_sects) { - unsigned int req_sects; - sector_t end_sect, tmp; - bio = bio_alloc(gfp_mask, 1); - if (!bio) { - ret = -ENOMEM; - break; - } + bio = bio_alloc(gfp_mask, 1); + if (!bio) { + ret = -ENOMEM; + goto out; + } - req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* -* If splitting a request, and the next starting sector would be -* misaligned, stop the discard at the previous aligned sector. -*/ - end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects nr_sects - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } + bio-bi_iter.bi_sector = sector; + bio-bi_end_io = bio_batch_end_io; + bio-bi_bdev = bdev; + bio-bi_private = bb; - bio-bi_iter.bi_sector = sector; - bio-bi_end_io = bio_batch_end_io; - bio-bi_bdev = bdev; - bio-bi_private = bb; + bio-bi_iter.bi_size = nr_sects 9; - bio-bi_iter.bi_size = req_sects 9; - nr_sects -= req_sects; - sector = end_sect; + atomic_inc(bb.done); + submit_bio(type, bio); - atomic_inc(bb.done); - submit_bio(type, bio); - - /* -* We can loop for a long time in here, if someone does -* full device discards (like mkfs). Be nice and allow -* us to schedule out to avoid softlocking if preempt -* is disabled. -*/ - cond_resched(); - } +out: blk_finish_plug(plug); /* Wait for bios in-flight */ -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5 10/11] block: remove bio_get_nr_vecs()
From: Kent Overstreet kent.overstr...@gmail.com We can always fill up the bio now, no need to estimate the possible size based on queue parameters. Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [hch: rebased and wrote a changelog] Signed-off-by: Christoph Hellwig h...@lst.de Signed-off-by: Ming Lin min...@ssi.samsung.com --- block/bio.c| 23 --- drivers/md/dm-io.c | 2 +- fs/btrfs/compression.c | 5 + fs/btrfs/extent_io.c | 9 ++--- fs/btrfs/inode.c | 3 +-- fs/btrfs/scrub.c | 18 ++ fs/direct-io.c | 2 +- fs/ext4/page-io.c | 3 +-- fs/ext4/readpage.c | 2 +- fs/f2fs/data.c | 2 +- fs/gfs2/lops.c | 9 + fs/logfs/dev_bdev.c| 4 ++-- fs/mpage.c | 4 ++-- fs/nilfs2/segbuf.c | 2 +- fs/xfs/xfs_aops.c | 3 +-- include/linux/bio.h| 1 - 16 files changed, 18 insertions(+), 74 deletions(-) diff --git a/block/bio.c b/block/bio.c index da15e9a..f28ca16 100644 --- a/block/bio.c +++ b/block/bio.c @@ -692,29 +692,6 @@ integrity_clone: EXPORT_SYMBOL(bio_clone_bioset); /** - * bio_get_nr_vecs - return approx number of vecs - * @bdev: I/O target - * - * Return the approximate number of pages we can send to this target. - * There's no guarantee that you will be able to fit this number of pages - * into a bio, it does not account for dynamic restrictions that vary - * on offset. - */ -int bio_get_nr_vecs(struct block_device *bdev) -{ - struct request_queue *q = bdev_get_queue(bdev); - int nr_pages; - - nr_pages = min_t(unsigned, -queue_max_segments(q), -queue_max_sectors(q) / (PAGE_SIZE 9) + 1); - - return min_t(unsigned, nr_pages, BIO_MAX_PAGES); - -} -EXPORT_SYMBOL(bio_get_nr_vecs); - -/** * bio_add_pc_page - attempt to add page to bio * @q: the target queue * @bio: destination bio diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c index 74adcd2..7d64272 100644 --- a/drivers/md/dm-io.c +++ b/drivers/md/dm-io.c @@ -314,7 +314,7 @@ static void do_region(int rw, unsigned region, struct dm_io_region *where, if ((rw REQ_DISCARD) || (rw REQ_WRITE_SAME)) num_bvecs = 1; else - num_bvecs = min_t(int, bio_get_nr_vecs(where-bdev), + num_bvecs = min_t(int, BIO_MAX_PAGES, dm_sector_div_up(remaining, (PAGE_SIZE SECTOR_SHIFT))); bio = bio_alloc_bioset(GFP_NOIO, num_bvecs, io-client-bios); diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index ce62324..449c752 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -97,10 +97,7 @@ static inline int compressed_bio_size(struct btrfs_root *root, static struct bio *compressed_bio_alloc(struct block_device *bdev, u64 first_byte, gfp_t gfp_flags) { - int nr_vecs; - - nr_vecs = bio_get_nr_vecs(bdev); - return btrfs_bio_alloc(bdev, first_byte 9, nr_vecs, gfp_flags); + return btrfs_bio_alloc(bdev, first_byte 9, BIO_MAX_PAGES, gfp_flags); } static int check_compressed_csum(struct inode *inode, diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 02d0581..ba89efd 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2802,9 +2802,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, { int ret = 0; struct bio *bio; - int nr; int contig = 0; - int this_compressed = bio_flags EXTENT_BIO_COMPRESSED; int old_compressed = prev_bio_flags EXTENT_BIO_COMPRESSED; size_t page_size = min_t(size_t, size, PAGE_CACHE_SIZE); @@ -2829,12 +2827,9 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, return 0; } } - if (this_compressed) - nr = BIO_MAX_PAGES; - else - nr = bio_get_nr_vecs(bdev); - bio = btrfs_bio_alloc(bdev, sector, nr, GFP_NOFS | __GFP_HIGH); + bio = btrfs_bio_alloc(bdev, sector, BIO_MAX_PAGES, + GFP_NOFS | __GFP_HIGH); if (!bio) return -ENOMEM; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 855935f..d66b9a3 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7959,8 +7959,7 @@ out: static struct bio *btrfs_dio_bio_alloc(struct block_device *bdev, u64 first_sector, gfp_t gfp_flags) { - int nr_vecs = bio_get_nr_vecs(bdev); - return btrfs_bio_alloc(bdev, first_sector, nr_vecs, gfp_flags); + return btrfs_bio_alloc(bdev, first_sector, BIO_MAX_PAGES, gfp_flags); } static inline int btrfs_lookup_and_bind_dio_csum(struct btrfs_root *root, diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 9f2feab..aab0b9a 100644 ---
[PATCH v5 11/11] Documentation: update notes in biovecs about arbitrarily sized bios
From: Dongsu Park dp...@posteo.net Update block/biovecs.txt so that it includes a note on what kind of effects arbitrarily sized bios would bring to the block layer. Also fix a trivial typo, bio_iter_iovec. Cc: Christoph Hellwig h...@infradead.org Cc: Kent Overstreet kent.overstr...@gmail.com Cc: Jonathan Corbet cor...@lwn.net Cc: linux-...@vger.kernel.org Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com --- Documentation/block/biovecs.txt | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt index 74a32ad..2568958 100644 --- a/Documentation/block/biovecs.txt +++ b/Documentation/block/biovecs.txt @@ -24,7 +24,7 @@ particular, presenting the illusion of partially completed biovecs so that normal code doesn't have to deal with bi_bvec_done. * Driver code should no longer refer to biovecs directly; we now have - bio_iovec() and bio_iovec_iter() macros that return literal struct biovecs, + bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs, constructed from the raw biovecs but taking into account bi_bvec_done and bi_size. @@ -109,3 +109,11 @@ Other implications: over all the biovecs in the new bio - which is silly as it's not needed. So, don't use bi_vcnt anymore. + + * The current interface allows the block layer to split bios as needed, so we + could eliminate a lot of complexity particularly in stacked drivers. Code + that creates bios can then create whatever size bios are convenient, and + more importantly stacked drivers don't have to deal with both their own bio + size limitations and the limitations of the underlying devices. Thus + there's no need to define -merge_bvec_fn() callbacks for individual block + drivers. -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/