[PATCH v5 11/11] Documentation: update notes in biovecs about arbitrarily sized bios

2015-07-06 Thread mlin
From: Dongsu Park 

Update block/biovecs.txt so that it includes a note on what kind of
effects arbitrarily sized bios would bring to the block layer.
Also fix a trivial typo, bio_iter_iovec.

Cc: Christoph Hellwig 
Cc: Kent Overstreet 
Cc: Jonathan Corbet 
Cc: linux-...@vger.kernel.org
Signed-off-by: Dongsu Park 
Signed-off-by: Ming Lin 
---
 Documentation/block/biovecs.txt | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt
index 74a32ad..2568958 100644
--- a/Documentation/block/biovecs.txt
+++ b/Documentation/block/biovecs.txt
@@ -24,7 +24,7 @@ particular, presenting the illusion of partially completed 
biovecs so that
 normal code doesn't have to deal with bi_bvec_done.
 
  * Driver code should no longer refer to biovecs directly; we now have
-   bio_iovec() and bio_iovec_iter() macros that return literal struct biovecs,
+   bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs,
constructed from the raw biovecs but taking into account bi_bvec_done and
bi_size.
 
@@ -109,3 +109,11 @@ Other implications:
over all the biovecs in the new bio - which is silly as it's not needed.
 
So, don't use bi_vcnt anymore.
+
+ * The current interface allows the block layer to split bios as needed, so we
+   could eliminate a lot of complexity particularly in stacked drivers. Code
+   that creates bios can then create whatever size bios are convenient, and
+   more importantly stacked drivers don't have to deal with both their own bio
+   size limitations and the limitations of the underlying devices. Thus
+   there's no need to define ->merge_bvec_fn() callbacks for individual block
+   drivers.
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 05/11] block: remove split code in blkdev_issue_discard

2015-07-06 Thread mlin
From: Ming Lin 

The split code in blkdev_issue_discard() can go away now
that any driver that cares does the split.

Signed-off-by: Ming Lin 
---
 block/blk-lib.c | 73 +++--
 1 file changed, 14 insertions(+), 59 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7688ee3..3bf3c4a 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -43,34 +43,17 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
-   unsigned int max_discard_sectors, granularity;
-   int alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
struct blk_plug plug;
 
-   if (!q)
+   if (!q || !nr_sects)
return -ENXIO;
 
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
 
-   /* Zero-sector (unknown) and one-sector granularities are the same.  */
-   granularity = max(q->limits.discard_granularity >> 9, 1U);
-   alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
-
-   /*
-* Ensure that max_discard_sectors is of the proper
-* granularity, so that requests stay aligned after a split.
-*/
-   max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
-   max_discard_sectors -= max_discard_sectors % granularity;
-   if (unlikely(!max_discard_sectors)) {
-   /* Avoid infinite loop below. Being cautious never hurts. */
-   return -EOPNOTSUPP;
-   }
-
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
return -EOPNOTSUPP;
@@ -82,52 +65,24 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
bb.wait = 
 
blk_start_plug();
-   while (nr_sects) {
-   unsigned int req_sects;
-   sector_t end_sect, tmp;
 
-   bio = bio_alloc(gfp_mask, 1);
-   if (!bio) {
-   ret = -ENOMEM;
-   break;
-   }
+   bio = bio_alloc(gfp_mask, 1);
+   if (!bio) {
+   ret = -ENOMEM;
+   goto out;
+   }
 
-   req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
-
-   /*
-* If splitting a request, and the next starting sector would be
-* misaligned, stop the discard at the previous aligned sector.
-*/
-   end_sect = sector + req_sects;
-   tmp = end_sect;
-   if (req_sects < nr_sects &&
-   sector_div(tmp, granularity) != alignment) {
-   end_sect = end_sect - alignment;
-   sector_div(end_sect, granularity);
-   end_sect = end_sect * granularity + alignment;
-   req_sects = end_sect - sector;
-   }
+   bio->bi_iter.bi_sector = sector;
+   bio->bi_end_io = bio_batch_end_io;
+   bio->bi_bdev = bdev;
+   bio->bi_private = 
 
-   bio->bi_iter.bi_sector = sector;
-   bio->bi_end_io = bio_batch_end_io;
-   bio->bi_bdev = bdev;
-   bio->bi_private = 
+   bio->bi_iter.bi_size = nr_sects << 9;
 
-   bio->bi_iter.bi_size = req_sects << 9;
-   nr_sects -= req_sects;
-   sector = end_sect;
+   atomic_inc();
+   submit_bio(type, bio);
 
-   atomic_inc();
-   submit_bio(type, bio);
-
-   /*
-* We can loop for a long time in here, if someone does
-* full device discards (like mkfs). Be nice and allow
-* us to schedule out to avoid softlocking if preempt
-* is disabled.
-*/
-   cond_resched();
-   }
+out:
blk_finish_plug();
 
/* Wait for bios in-flight */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 10/11] block: remove bio_get_nr_vecs()

2015-07-06 Thread mlin
From: Kent Overstreet 

We can always fill up the bio now, no need to estimate the possible
size based on queue parameters.

Signed-off-by: Kent Overstreet 
[hch: rebased and wrote a changelog]
Signed-off-by: Christoph Hellwig 
Signed-off-by: Ming Lin 
---
 block/bio.c| 23 ---
 drivers/md/dm-io.c |  2 +-
 fs/btrfs/compression.c |  5 +
 fs/btrfs/extent_io.c   |  9 ++---
 fs/btrfs/inode.c   |  3 +--
 fs/btrfs/scrub.c   | 18 ++
 fs/direct-io.c |  2 +-
 fs/ext4/page-io.c  |  3 +--
 fs/ext4/readpage.c |  2 +-
 fs/f2fs/data.c |  2 +-
 fs/gfs2/lops.c |  9 +
 fs/logfs/dev_bdev.c|  4 ++--
 fs/mpage.c |  4 ++--
 fs/nilfs2/segbuf.c |  2 +-
 fs/xfs/xfs_aops.c  |  3 +--
 include/linux/bio.h|  1 -
 16 files changed, 18 insertions(+), 74 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index da15e9a..f28ca16 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -692,29 +692,6 @@ integrity_clone:
 EXPORT_SYMBOL(bio_clone_bioset);
 
 /**
- * bio_get_nr_vecs - return approx number of vecs
- * @bdev:  I/O target
- *
- * Return the approximate number of pages we can send to this target.
- * There's no guarantee that you will be able to fit this number of pages
- * into a bio, it does not account for dynamic restrictions that vary
- * on offset.
- */
-int bio_get_nr_vecs(struct block_device *bdev)
-{
-   struct request_queue *q = bdev_get_queue(bdev);
-   int nr_pages;
-
-   nr_pages = min_t(unsigned,
-queue_max_segments(q),
-queue_max_sectors(q) / (PAGE_SIZE >> 9) + 1);
-
-   return min_t(unsigned, nr_pages, BIO_MAX_PAGES);
-
-}
-EXPORT_SYMBOL(bio_get_nr_vecs);
-
-/**
  * bio_add_pc_page -   attempt to add page to bio
  * @q: the target queue
  * @bio: destination bio
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 74adcd2..7d64272 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -314,7 +314,7 @@ static void do_region(int rw, unsigned region, struct 
dm_io_region *where,
if ((rw & REQ_DISCARD) || (rw & REQ_WRITE_SAME))
num_bvecs = 1;
else
-   num_bvecs = min_t(int, bio_get_nr_vecs(where->bdev),
+   num_bvecs = min_t(int, BIO_MAX_PAGES,
  dm_sector_div_up(remaining, 
(PAGE_SIZE >> SECTOR_SHIFT)));
 
bio = bio_alloc_bioset(GFP_NOIO, num_bvecs, io->client->bios);
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index ce62324..449c752 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -97,10 +97,7 @@ static inline int compressed_bio_size(struct btrfs_root 
*root,
 static struct bio *compressed_bio_alloc(struct block_device *bdev,
u64 first_byte, gfp_t gfp_flags)
 {
-   int nr_vecs;
-
-   nr_vecs = bio_get_nr_vecs(bdev);
-   return btrfs_bio_alloc(bdev, first_byte >> 9, nr_vecs, gfp_flags);
+   return btrfs_bio_alloc(bdev, first_byte >> 9, BIO_MAX_PAGES, gfp_flags);
 }
 
 static int check_compressed_csum(struct inode *inode,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 02d0581..ba89efd 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2802,9 +2802,7 @@ static int submit_extent_page(int rw, struct 
extent_io_tree *tree,
 {
int ret = 0;
struct bio *bio;
-   int nr;
int contig = 0;
-   int this_compressed = bio_flags & EXTENT_BIO_COMPRESSED;
int old_compressed = prev_bio_flags & EXTENT_BIO_COMPRESSED;
size_t page_size = min_t(size_t, size, PAGE_CACHE_SIZE);
 
@@ -2829,12 +2827,9 @@ static int submit_extent_page(int rw, struct 
extent_io_tree *tree,
return 0;
}
}
-   if (this_compressed)
-   nr = BIO_MAX_PAGES;
-   else
-   nr = bio_get_nr_vecs(bdev);
 
-   bio = btrfs_bio_alloc(bdev, sector, nr, GFP_NOFS | __GFP_HIGH);
+   bio = btrfs_bio_alloc(bdev, sector, BIO_MAX_PAGES,
+   GFP_NOFS | __GFP_HIGH);
if (!bio)
return -ENOMEM;
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 855935f..d66b9a3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7959,8 +7959,7 @@ out:
 static struct bio *btrfs_dio_bio_alloc(struct block_device *bdev,
   u64 first_sector, gfp_t gfp_flags)
 {
-   int nr_vecs = bio_get_nr_vecs(bdev);
-   return btrfs_bio_alloc(bdev, first_sector, nr_vecs, gfp_flags);
+   return btrfs_bio_alloc(bdev, first_sector, BIO_MAX_PAGES, gfp_flags);
 }
 
 static inline int btrfs_lookup_and_bind_dio_csum(struct btrfs_root *root,
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 9f2feab..aab0b9a 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -454,27 +454,14 @@ struct 

[PATCH v5 03/11] bcache: remove driver private bio splitting code

2015-07-06 Thread mlin
From: Kent Overstreet 

The bcache driver has always accepted arbitrarily large bios and split
them internally.  Now that every driver must accept arbitrarily large
bios this code isn't nessecary anymore.

Cc: linux-bca...@vger.kernel.org
Signed-off-by: Kent Overstreet 
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park 
Signed-off-by: Ming Lin 
---
 drivers/md/bcache/bcache.h|  18 
 drivers/md/bcache/io.c| 100 +-
 drivers/md/bcache/journal.c   |   4 +-
 drivers/md/bcache/request.c   |  16 +++
 drivers/md/bcache/super.c |  32 +-
 drivers/md/bcache/util.h  |   5 ++-
 drivers/md/bcache/writeback.c |   4 +-
 7 files changed, 18 insertions(+), 161 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 04f7bc2..6b420a5 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -243,19 +243,6 @@ struct keybuf {
DECLARE_ARRAY_ALLOCATOR(struct keybuf_key, freelist, KEYBUF_NR);
 };
 
-struct bio_split_pool {
-   struct bio_set  *bio_split;
-   mempool_t   *bio_split_hook;
-};
-
-struct bio_split_hook {
-   struct closure  cl;
-   struct bio_split_pool   *p;
-   struct bio  *bio;
-   bio_end_io_t*bi_end_io;
-   void*bi_private;
-};
-
 struct bcache_device {
struct closure  cl;
 
@@ -288,8 +275,6 @@ struct bcache_device {
int (*cache_miss)(struct btree *, struct search *,
  struct bio *, unsigned);
int (*ioctl) (struct bcache_device *, fmode_t, unsigned, unsigned long);
-
-   struct bio_split_pool   bio_split_hook;
 };
 
 struct io {
@@ -454,8 +439,6 @@ struct cache {
atomic_long_t   meta_sectors_written;
atomic_long_t   btree_sectors_written;
atomic_long_t   sectors_written;
-
-   struct bio_split_pool   bio_split_hook;
 };
 
 struct gc_stat {
@@ -873,7 +856,6 @@ void bch_bbio_endio(struct cache_set *, struct bio *, int, 
const char *);
 void bch_bbio_free(struct bio *, struct cache_set *);
 struct bio *bch_bbio_alloc(struct cache_set *);
 
-void bch_generic_make_request(struct bio *, struct bio_split_pool *);
 void __bch_submit_bbio(struct bio *, struct cache_set *);
 void bch_submit_bbio(struct bio *, struct cache_set *, struct bkey *, 
unsigned);
 
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index cb64e64a..86a0bb8 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -11,104 +11,6 @@
 
 #include 
 
-static unsigned bch_bio_max_sectors(struct bio *bio)
-{
-   struct request_queue *q = bdev_get_queue(bio->bi_bdev);
-   struct bio_vec bv;
-   struct bvec_iter iter;
-   unsigned ret = 0, seg = 0;
-
-   if (bio->bi_rw & REQ_DISCARD)
-   return min(bio_sectors(bio), q->limits.max_discard_sectors);
-
-   bio_for_each_segment(bv, bio, iter) {
-   struct bvec_merge_data bvm = {
-   .bi_bdev= bio->bi_bdev,
-   .bi_sector  = bio->bi_iter.bi_sector,
-   .bi_size= ret << 9,
-   .bi_rw  = bio->bi_rw,
-   };
-
-   if (seg == min_t(unsigned, BIO_MAX_PAGES,
-queue_max_segments(q)))
-   break;
-
-   if (q->merge_bvec_fn &&
-   q->merge_bvec_fn(q, , ) < (int) bv.bv_len)
-   break;
-
-   seg++;
-   ret += bv.bv_len >> 9;
-   }
-
-   ret = min(ret, queue_max_sectors(q));
-
-   WARN_ON(!ret);
-   ret = max_t(int, ret, bio_iovec(bio).bv_len >> 9);
-
-   return ret;
-}
-
-static void bch_bio_submit_split_done(struct closure *cl)
-{
-   struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-   s->bio->bi_end_io = s->bi_end_io;
-   s->bio->bi_private = s->bi_private;
-   bio_endio(s->bio, 0);
-
-   closure_debug_destroy(>cl);
-   mempool_free(s, s->p->bio_split_hook);
-}
-
-static void bch_bio_submit_split_endio(struct bio *bio, int error)
-{
-   struct closure *cl = bio->bi_private;
-   struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-   if (error)
-   clear_bit(BIO_UPTODATE, >bio->bi_flags);
-
-   bio_put(bio);
-   closure_put(cl);
-}
-
-void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
-{
-   struct bio_split_hook *s;
-   struct bio *n;
-
-   if (!bio_has_data(bio) && !(bio->bi_rw & REQ_DISCARD))
-   goto submit;
-
-   if (bio_sectors(bio) <= bch_bio_max_sectors(bio))
-   goto submit;
-
-   s = mempool_alloc(p->bio_split_hook, GFP_NOIO);
-   closure_init(>cl, NULL);
-
-   s->bio  = bio;
-   s->p= p;
-   

[PATCH v5 08/11] block: kill merge_bvec_fn() completely

2015-07-06 Thread mlin
From: Kent Overstreet 

As generic_make_request() is now able to handle arbitrarily sized bios,
it's no longer necessary for each individual block driver to define its
own ->merge_bvec_fn() callback. Remove every invocation completely.

Cc: Jens Axboe 
Cc: Lars Ellenberg 
Cc: drbd-u...@lists.linbit.com
Cc: Jiri Kosina 
Cc: Yehuda Sadeh 
Cc: Sage Weil 
Cc: Alex Elder 
Cc: ceph-de...@vger.kernel.org
Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Neil Brown 
Cc: linux-r...@vger.kernel.org
Cc: Christoph Hellwig 
Cc: "Martin K. Petersen" 
Acked-by: NeilBrown  (for the 'md' bits)
Signed-off-by: Kent Overstreet 
[dpark: also remove ->merge_bvec_fn() in dm-thin as well as
 dm-era-target, and resolve merge conflicts]
Signed-off-by: Dongsu Park 
Signed-off-by: Ming Lin 
---
 block/blk-merge.c  |  17 +-
 block/blk-settings.c   |  22 ---
 drivers/block/drbd/drbd_int.h  |   1 -
 drivers/block/drbd/drbd_main.c |   1 -
 drivers/block/drbd/drbd_req.c  |  35 
 drivers/block/pktcdvd.c|  21 ---
 drivers/block/rbd.c|  47 ---
 drivers/md/dm-cache-target.c   |  21 ---
 drivers/md/dm-crypt.c  |  16 --
 drivers/md/dm-era-target.c |  15 -
 drivers/md/dm-flakey.c |  16 --
 drivers/md/dm-linear.c |  16 --
 drivers/md/dm-log-writes.c |  16 --
 drivers/md/dm-raid.c   |  19 --
 drivers/md/dm-snap.c   |  15 -
 drivers/md/dm-stripe.c |  21 ---
 drivers/md/dm-table.c  |   8 ---
 drivers/md/dm-thin.c   |  31 --
 drivers/md/dm-verity.c |  16 --
 drivers/md/dm.c| 127 +
 drivers/md/dm.h|   2 -
 drivers/md/linear.c|  43 --
 drivers/md/md.c|  26 -
 drivers/md/md.h|  12 
 drivers/md/multipath.c |  21 ---
 drivers/md/raid0.c |  56 --
 drivers/md/raid0.h |   2 -
 drivers/md/raid1.c |  58 +--
 drivers/md/raid10.c| 121 +--
 drivers/md/raid5.c |  32 ---
 include/linux/blkdev.h |  10 
 include/linux/device-mapper.h  |   4 --
 32 files changed, 9 insertions(+), 859 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 3707f30..1f5dfa0 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -69,24 +69,13 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
struct bio *split;
struct bio_vec bv, bvprv;
struct bvec_iter iter;
-   unsigned seg_size = 0, nsegs = 0;
+   unsigned seg_size = 0, nsegs = 0, sectors = 0;
int prev = 0;
 
-   struct bvec_merge_data bvm = {
-   .bi_bdev= bio->bi_bdev,
-   .bi_sector  = bio->bi_iter.bi_sector,
-   .bi_size= 0,
-   .bi_rw  = bio->bi_rw,
-   };
-
bio_for_each_segment(bv, bio, iter) {
-   if (q->merge_bvec_fn &&
-   q->merge_bvec_fn(q, , ) < (int) bv.bv_len)
-   goto split;
-
-   bvm.bi_size += bv.bv_len;
+   sectors += bv.bv_len >> 9;
 
-   if (bvm.bi_size >> 9 > queue_max_sectors(q))
+   if (sectors > queue_max_sectors(q))
goto split;
 
/*
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 12600bf..e90d477 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -53,28 +53,6 @@ void blk_queue_unprep_rq(struct request_queue *q, 
unprep_rq_fn *ufn)
 }
 EXPORT_SYMBOL(blk_queue_unprep_rq);
 
-/**
- * blk_queue_merge_bvec - set a merge_bvec function for queue
- * @q: queue
- * @mbfn:  merge_bvec_fn
- *
- * Usually queues have static limitations on the max sectors or segments that
- * we can put in a request. Stacking drivers may have some settings that
- * are dynamic, and thus we have to query the queue whether it is ok to
- * add a new bio_vec to a bio at a given offset or not. If the block device
- * has such limitations, it needs to register a merge_bvec_fn to control
- * the size of bio's sent to it. Note that a block device *must* allow a
- * single page to be added to an empty bio. The block device driver may want
- * to use the bio_split() function to deal with these bio's. By default
- * no merge_bvec_fn is defined for a queue, and only the fixed limits are
- * honored.
- */
-void blk_queue_merge_bvec(struct request_queue *q, merge_bvec_fn *mbfn)
-{
-   q->merge_bvec_fn = mbfn;
-}
-EXPORT_SYMBOL(blk_queue_merge_bvec);
-
 void blk_queue_softirq_done(struct request_queue *q, softirq_done_fn *fn)
 {
q->softirq_done_fn = fn;
diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index efd19c2..7ac66f3 100644
--- a/drivers/block/drbd/drbd_int.h

[PATCH v5 07/11] md/raid5: get rid of bio_fits_rdev()

2015-07-06 Thread mlin
From: Kent Overstreet 

Remove bio_fits_rdev() as sufficient merge_bvec_fn() handling is now
performed by blk_queue_split() in md_make_request().

Cc: Neil Brown 
Cc: linux-r...@vger.kernel.org
Acked-by: NeilBrown 
Signed-off-by: Kent Overstreet 
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park 
Signed-off-by: Ming Lin 
---
 drivers/md/raid5.c | 23 +--
 1 file changed, 1 insertion(+), 22 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8377e72..8bdf81a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4780,25 +4780,6 @@ static void raid5_align_endio(struct bio *bi, int error)
add_bio_to_retry(raid_bi, conf);
 }
 
-static int bio_fits_rdev(struct bio *bi)
-{
-   struct request_queue *q = bdev_get_queue(bi->bi_bdev);
-
-   if (bio_sectors(bi) > queue_max_sectors(q))
-   return 0;
-   blk_recount_segments(q, bi);
-   if (bi->bi_phys_segments > queue_max_segments(q))
-   return 0;
-
-   if (q->merge_bvec_fn)
-   /* it's too hard to apply the merge_bvec_fn at this stage,
-* just just give up
-*/
-   return 0;
-
-   return 1;
-}
-
 static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
 {
struct r5conf *conf = mddev->private;
@@ -4852,11 +4833,9 @@ static int raid5_read_one_chunk(struct mddev *mddev, 
struct bio *raid_bio)
align_bi->bi_bdev =  rdev->bdev;
__clear_bit(BIO_SEG_VALID, _bi->bi_flags);
 
-   if (!bio_fits_rdev(align_bi) ||
-   is_badblock(rdev, align_bi->bi_iter.bi_sector,
+   if (is_badblock(rdev, align_bi->bi_iter.bi_sector,
bio_sectors(align_bi),
_bad, _sectors)) {
-   /* too big in some way, or has a known bad block */
bio_put(align_bi);
rdev_dec_pending(rdev, mddev);
return 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 02/11] block: simplify bio_add_page()

2015-07-06 Thread mlin
From: Kent Overstreet 

Since generic_make_request() can now handle arbitrary size bios, all we
have to do is make sure the bvec array doesn't overflow.
__bio_add_page() doesn't need to call ->merge_bvec_fn(), where
we can get rid of unnecessary code paths.

Removing the call to ->merge_bvec_fn() is also fine, as no driver that
implements support for BLOCK_PC commands even has a ->merge_bvec_fn()
method.

Cc: Christoph Hellwig 
Cc: Jens Axboe 
Signed-off-by: Kent Overstreet 
[dpark: rebase and resolve merge conflicts, change a couple of comments,
 make bio_add_page() warn once upon a cloned bio.]
Signed-off-by: Dongsu Park 
Signed-off-by: Ming Lin 
---
 block/bio.c | 135 +---
 1 file changed, 55 insertions(+), 80 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 2a00d34..da15e9a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -714,9 +714,23 @@ int bio_get_nr_vecs(struct block_device *bdev)
 }
 EXPORT_SYMBOL(bio_get_nr_vecs);
 
-static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
- *page, unsigned int len, unsigned int offset,
- unsigned int max_sectors)
+/**
+ * bio_add_pc_page -   attempt to add page to bio
+ * @q: the target queue
+ * @bio: destination bio
+ * @page: page to add
+ * @len: vec entry length
+ * @offset: vec entry offset
+ *
+ * Attempt to add a page to the bio_vec maplist. This can fail for a
+ * number of reasons, such as the bio being full or target block device
+ * limitations. The target block device must allow bio's up to PAGE_SIZE,
+ * so it is always possible to add a single page to an empty bio.
+ *
+ * This should only be used by REQ_PC bios.
+ */
+int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
+   *page, unsigned int len, unsigned int offset)
 {
int retried_segments = 0;
struct bio_vec *bvec;
@@ -727,7 +741,7 @@ static int __bio_add_page(struct request_queue *q, struct 
bio *bio, struct page
if (unlikely(bio_flagged(bio, BIO_CLONED)))
return 0;
 
-   if (((bio->bi_iter.bi_size + len) >> 9) > max_sectors)
+   if (((bio->bi_iter.bi_size + len) >> 9) > queue_max_hw_sectors(q))
return 0;
 
/*
@@ -740,28 +754,7 @@ static int __bio_add_page(struct request_queue *q, struct 
bio *bio, struct page
 
if (page == prev->bv_page &&
offset == prev->bv_offset + prev->bv_len) {
-   unsigned int prev_bv_len = prev->bv_len;
prev->bv_len += len;
-
-   if (q->merge_bvec_fn) {
-   struct bvec_merge_data bvm = {
-   /* prev_bvec is already charged in
-  bi_size, discharge it in order to
-  simulate merging updated prev_bvec
-  as new bvec. */
-   .bi_bdev = bio->bi_bdev,
-   .bi_sector = bio->bi_iter.bi_sector,
-   .bi_size = bio->bi_iter.bi_size -
-   prev_bv_len,
-   .bi_rw = bio->bi_rw,
-   };
-
-   if (q->merge_bvec_fn(q, , prev) < 
prev->bv_len) {
-   prev->bv_len -= len;
-   return 0;
-   }
-   }
-
bio->bi_iter.bi_size += len;
goto done;
}
@@ -804,27 +797,6 @@ static int __bio_add_page(struct request_queue *q, struct 
bio *bio, struct page
blk_recount_segments(q, bio);
}
 
-   /*
-* if queue has other restrictions (eg varying max sector size
-* depending on offset), it can specify a merge_bvec_fn in the
-* queue to get further control
-*/
-   if (q->merge_bvec_fn) {
-   struct bvec_merge_data bvm = {
-   .bi_bdev = bio->bi_bdev,
-   .bi_sector = bio->bi_iter.bi_sector,
-   .bi_size = bio->bi_iter.bi_size - len,
-   .bi_rw = bio->bi_rw,
-   };
-
-   /*
-* merge_bvec_fn() returns number of bytes it can accept
-* at this offset
-*/
-   if (q->merge_bvec_fn(q, , bvec) < bvec->bv_len)
-   goto failed;
-   }
-
/* If we may be able to merge these biovecs, force a recount */
if (bio->bi_vcnt > 1 && (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))
bio->bi_flags &= ~(1 << BIO_SEG_VALID);
@@ -841,28 +813,6 @@ static int 

[PATCH v5 04/11] btrfs: remove bio splitting and merge_bvec_fn() calls

2015-07-06 Thread mlin
From: Kent Overstreet 

Btrfs has been doing bio splitting from btrfs_map_bio(), by checking
device limits as well as calling ->merge_bvec_fn() etc. That is not
necessary any more, because generic_make_request() is now able to
handle arbitrarily sized bios. So clean up unnecessary code paths.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: linux-bt...@vger.kernel.org
Signed-off-by: Kent Overstreet 
Signed-off-by: Chris Mason 
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park 
Signed-off-by: Ming Lin 
---
 fs/btrfs/volumes.c | 72 --
 1 file changed, 72 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4b438b4..fd25b81 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5833,34 +5833,6 @@ static noinline void btrfs_schedule_bio(struct 
btrfs_root *root,
 >work);
 }
 
-static int bio_size_ok(struct block_device *bdev, struct bio *bio,
-  sector_t sector)
-{
-   struct bio_vec *prev;
-   struct request_queue *q = bdev_get_queue(bdev);
-   unsigned int max_sectors = queue_max_sectors(q);
-   struct bvec_merge_data bvm = {
-   .bi_bdev = bdev,
-   .bi_sector = sector,
-   .bi_rw = bio->bi_rw,
-   };
-
-   if (WARN_ON(bio->bi_vcnt == 0))
-   return 1;
-
-   prev = >bi_io_vec[bio->bi_vcnt - 1];
-   if (bio_sectors(bio) > max_sectors)
-   return 0;
-
-   if (!q->merge_bvec_fn)
-   return 1;
-
-   bvm.bi_size = bio->bi_iter.bi_size - prev->bv_len;
-   if (q->merge_bvec_fn(q, , prev) < prev->bv_len)
-   return 0;
-   return 1;
-}
-
 static void submit_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio,
  struct bio *bio, u64 physical, int dev_nr,
  int rw, int async)
@@ -5894,38 +5866,6 @@ static void submit_stripe_bio(struct btrfs_root *root, 
struct btrfs_bio *bbio,
btrfsic_submit_bio(rw, bio);
 }
 
-static int breakup_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio,
- struct bio *first_bio, struct btrfs_device *dev,
- int dev_nr, int rw, int async)
-{
-   struct bio_vec *bvec = first_bio->bi_io_vec;
-   struct bio *bio;
-   int nr_vecs = bio_get_nr_vecs(dev->bdev);
-   u64 physical = bbio->stripes[dev_nr].physical;
-
-again:
-   bio = btrfs_bio_alloc(dev->bdev, physical >> 9, nr_vecs, GFP_NOFS);
-   if (!bio)
-   return -ENOMEM;
-
-   while (bvec <= (first_bio->bi_io_vec + first_bio->bi_vcnt - 1)) {
-   if (bio_add_page(bio, bvec->bv_page, bvec->bv_len,
-bvec->bv_offset) < bvec->bv_len) {
-   u64 len = bio->bi_iter.bi_size;
-
-   atomic_inc(>stripes_pending);
-   submit_stripe_bio(root, bbio, bio, physical, dev_nr,
- rw, async);
-   physical += len;
-   goto again;
-   }
-   bvec++;
-   }
-
-   submit_stripe_bio(root, bbio, bio, physical, dev_nr, rw, async);
-   return 0;
-}
-
 static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical)
 {
atomic_inc(>error);
@@ -5998,18 +5938,6 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, 
struct bio *bio,
continue;
}
 
-   /*
-* Check and see if we're ok with this bio based on it's size
-* and offset with the given device.
-*/
-   if (!bio_size_ok(dev->bdev, first_bio,
-bbio->stripes[dev_nr].physical >> 9)) {
-   ret = breakup_stripe_bio(root, bbio, first_bio, dev,
-dev_nr, rw, async_submit);
-   BUG_ON(ret);
-   continue;
-   }
-
if (dev_nr < total_devs - 1) {
bio = btrfs_bio_clone(first_bio, GFP_NOFS);
BUG_ON(!bio); /* -ENOMEM */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 09/11] fs: use helper bio_add_page() instead of open coding on bi_io_vec

2015-07-06 Thread mlin
From: Kent Overstreet 

Call pre-defined helper bio_add_page() instead of open coding for
iterating through bi_io_vec[]. Doing that, it's possible to make some
parts in filesystems and mm/page_io.c simpler than before.

Acked-by: Dave Kleikamp 
Cc: Christoph Hellwig 
Cc: Al Viro 
Cc: linux-fsde...@vger.kernel.org
Signed-off-by: Kent Overstreet 
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park 
Signed-off-by: Ming Lin 
---
 fs/buffer.c |  7 ++-
 fs/jfs/jfs_logmgr.c | 14 --
 mm/page_io.c|  8 +++-
 3 files changed, 9 insertions(+), 20 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 1cf7a53..95996ba 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3046,12 +3046,9 @@ static int submit_bh_wbc(int rw, struct buffer_head *bh,
 
bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
bio->bi_bdev = bh->b_bdev;
-   bio->bi_io_vec[0].bv_page = bh->b_page;
-   bio->bi_io_vec[0].bv_len = bh->b_size;
-   bio->bi_io_vec[0].bv_offset = bh_offset(bh);
 
-   bio->bi_vcnt = 1;
-   bio->bi_iter.bi_size = bh->b_size;
+   bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
+   BUG_ON(bio->bi_iter.bi_size != bh->b_size);
 
bio->bi_end_io = end_bio_bh_io_sync;
bio->bi_private = bh;
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index bc462dc..46fae06 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -1999,12 +1999,9 @@ static int lbmRead(struct jfs_log * log, int pn, struct 
lbuf ** bpp)
 
bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9);
bio->bi_bdev = log->bdev;
-   bio->bi_io_vec[0].bv_page = bp->l_page;
-   bio->bi_io_vec[0].bv_len = LOGPSIZE;
-   bio->bi_io_vec[0].bv_offset = bp->l_offset;
 
-   bio->bi_vcnt = 1;
-   bio->bi_iter.bi_size = LOGPSIZE;
+   bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset);
+   BUG_ON(bio->bi_iter.bi_size != LOGPSIZE);
 
bio->bi_end_io = lbmIODone;
bio->bi_private = bp;
@@ -2145,12 +2142,9 @@ static void lbmStartIO(struct lbuf * bp)
bio = bio_alloc(GFP_NOFS, 1);
bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9);
bio->bi_bdev = log->bdev;
-   bio->bi_io_vec[0].bv_page = bp->l_page;
-   bio->bi_io_vec[0].bv_len = LOGPSIZE;
-   bio->bi_io_vec[0].bv_offset = bp->l_offset;
 
-   bio->bi_vcnt = 1;
-   bio->bi_iter.bi_size = LOGPSIZE;
+   bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset);
+   BUG_ON(bio->bi_iter.bi_size != LOGPSIZE);
 
bio->bi_end_io = lbmIODone;
bio->bi_private = bp;
diff --git a/mm/page_io.c b/mm/page_io.c
index 520baa4..194081b 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -33,12 +33,10 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
if (bio) {
bio->bi_iter.bi_sector = map_swap_page(page, >bi_bdev);
bio->bi_iter.bi_sector <<= PAGE_SHIFT - 9;
-   bio->bi_io_vec[0].bv_page = page;
-   bio->bi_io_vec[0].bv_len = PAGE_SIZE;
-   bio->bi_io_vec[0].bv_offset = 0;
-   bio->bi_vcnt = 1;
-   bio->bi_iter.bi_size = PAGE_SIZE;
bio->bi_end_io = end_io;
+
+   bio_add_page(bio, page, PAGE_SIZE, 0);
+   BUG_ON(bio->bi_iter.bi_size != PAGE_SIZE);
}
return bio;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-06 Thread mlin
From: Kent Overstreet 

The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.

But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them.  In the future this will
let us delete merge_bvec_fn and a bunch of other code.

We do this by adding calls to blk_queue_split() to the various
make_request functions that need it - a few can already handle arbitrary
size bios. Note that we add the call _after_ any call to
blk_queue_bounce(); this means that blk_queue_split() and
blk_recalc_rq_segments() don't need to be concerned with bouncing
affecting segment merging.

Some make_request_fn() callbacks were simple enough to audit and verify
they don't need blk_queue_split() calls. The skipped ones are:

 * nfhd_make_request (arch/m68k/emu/nfblock.c)
 * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
 * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
 * brd_make_request (ramdisk - drivers/block/brd.c)
 * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
 * loop_make_request
 * null_queue_bio
 * bcache's make_request fns

Some others are almost certainly safe to remove now, but will be left
for future patches.

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Al Viro 
Cc: Ming Lei 
Cc: Neil Brown 
Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Lars Ellenberg 
Cc: drbd-u...@lists.linbit.com
Cc: Jiri Kosina 
Cc: Geoff Levand 
Cc: Jim Paris 
Cc: Joshua Morris 
Cc: Philip Kelleher 
Cc: Minchan Kim 
Cc: Nitin Gupta 
Cc: Oleg Drokin 
Cc: Andreas Dilger 
Acked-by: NeilBrown  (for the 'md/md.c' bits)
Signed-off-by: Kent Overstreet 
[dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
Signed-off-by: Dongsu Park 
Signed-off-by: Ming Lin 
---
 block/blk-core.c|  19 ++--
 block/blk-merge.c   | 159 ++--
 block/blk-mq.c  |   4 +
 block/blk-sysfs.c   |   3 +
 drivers/block/drbd/drbd_req.c   |   2 +
 drivers/block/pktcdvd.c |   6 +-
 drivers/block/ps3vram.c |   2 +
 drivers/block/rsxx/dev.c|   2 +
 drivers/block/umem.c|   2 +
 drivers/block/zram/zram_drv.c   |   2 +
 drivers/md/dm.c |   2 +
 drivers/md/md.c |   2 +
 drivers/s390/block/dcssblk.c|   2 +
 drivers/s390/block/xpram.c  |   2 +
 drivers/staging/lustre/lustre/llite/lloop.c |   2 +
 include/linux/blkdev.h  |   3 +
 16 files changed, 192 insertions(+), 22 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 82819e6..cecf80c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -645,6 +645,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
if (q->id < 0)
goto fail_q;
 
+   q->bio_split = bioset_create(BIO_POOL_SIZE, 0);
+   if (!q->bio_split)
+   goto fail_id;
+
q->backing_dev_info.ra_pages =
(VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
q->backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK;
@@ -653,7 +657,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
err = bdi_init(>backing_dev_info);
if (err)
-   goto fail_id;
+   goto fail_split;
 
setup_timer(>backing_dev_info.laptop_mode_wb_timer,
laptop_mode_timer_fn, (unsigned long) q);
@@ -695,6 +699,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
 fail_bdi:
bdi_destroy(>backing_dev_info);
+fail_split:
+   bioset_free(q->bio_split);
 fail_id:
ida_simple_remove(_queue_ida, q->id);
 fail_q:
@@ -1612,6 +1618,8 @@ static void blk_queue_bio(struct request_queue *q, struct 
bio *bio)
struct request *req;
unsigned int request_count = 0;
 
+   blk_queue_split(q, , q->bio_split);
+
/*
 * low level driver can indicate that it wants pages above a
 * certain limit bounced to low memory (ie for highmem, or even
@@ -1832,15 +1840,6 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
 
-   if (likely(bio_is_rw(bio) &&
-  nr_sectors > 

[PATCH v5 06/11] md/raid5: split bio for chunk_aligned_read

2015-07-06 Thread mlin
From: Ming Lin 

If a read request fits entirely in a chunk, it will be passed directly to the
underlying device (providing it hasn't failed of course).  If it doesn't fit,
the slightly less efficient path that uses the stripe_cache is used.
Requests that get to the stripe cache are always completely split up as
necessary.

So with RAID5, ripping out the merge_bvec_fn doesn't cause it to stop work,
but could cause it to take the less efficient path more often.

All that is needed to manage this is for 'chunk_aligned_read' do some bio
splitting, much like the RAID0 code does.

Cc: Neil Brown 
Cc: linux-r...@vger.kernel.org
Acked-by: NeilBrown 
Signed-off-by: Ming Lin 
---
 drivers/md/raid5.c | 37 -
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 59e44e9..8377e72 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4799,7 +4799,7 @@ static int bio_fits_rdev(struct bio *bi)
return 1;
 }
 
-static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio)
+static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
 {
struct r5conf *conf = mddev->private;
int dd_idx;
@@ -4808,7 +4808,7 @@ static int chunk_aligned_read(struct mddev *mddev, struct 
bio * raid_bio)
sector_t end_sector;
 
if (!in_chunk_boundary(mddev, raid_bio)) {
-   pr_debug("chunk_aligned_read : non aligned\n");
+   pr_debug("%s: non aligned\n", __func__);
return 0;
}
/*
@@ -4885,6 +4885,31 @@ static int chunk_aligned_read(struct mddev *mddev, 
struct bio * raid_bio)
}
 }
 
+static struct bio *chunk_aligned_read(struct mddev *mddev, struct bio 
*raid_bio)
+{
+   struct bio *split;
+
+   do {
+   sector_t sector = raid_bio->bi_iter.bi_sector;
+   unsigned chunk_sects = mddev->chunk_sectors;
+   unsigned sectors = chunk_sects - (sector & (chunk_sects-1));
+
+   if (sectors < bio_sectors(raid_bio)) {
+   split = bio_split(raid_bio, sectors, GFP_NOIO, 
fs_bio_set);
+   bio_chain(split, raid_bio);
+   } else
+   split = raid_bio;
+
+   if (!raid5_read_one_chunk(mddev, split)) {
+   if (split != raid_bio)
+   generic_make_request(raid_bio);
+   return split;
+   }
+   } while (split != raid_bio);
+
+   return NULL;
+}
+
 /* __get_priority_stripe - get the next stripe to process
  *
  * Full stripe writes are allowed to pass preread active stripes up until
@@ -5162,9 +5187,11 @@ static void make_request(struct mddev *mddev, struct bio 
* bi)
 * data on failed drives.
 */
if (rw == READ && mddev->degraded == 0 &&
-mddev->reshape_position == MaxSector &&
-chunk_aligned_read(mddev,bi))
-   return;
+   mddev->reshape_position == MaxSector) {
+   bi = chunk_aligned_read(mddev, bi);
+   if (!bi)
+   return;
+   }
 
if (unlikely(bi->bi_rw & REQ_DISCARD)) {
make_discard_request(mddev, bi);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 00/11] simplify block layer based on immutable biovecs

2015-07-06 Thread mlin
Hi Mike,

On Wed, 2015-06-10 at 17:46 -0400, Mike Snitzer wrote:
> I've been busy getting DM changes for the 4.2 merge window finalized.
> As such I haven't connected with others on the team to discuss this
> issue.
> 
> I'll see if we can make time in the next 2 days.  But I also have
> RHEL-specific kernel deadlines I'm coming up against.
> 
> Seems late to be staging this extensive a change for 4.2... are you
> pushing for this code to land in the 4.2 merge window?  Or do we have
> time to work this further and target the 4.3 merge?
> 

4.2-rc1 was out.
Would you have time to work together for 4.3 merge? 

Fio test results(4.1-rc4/rc7) showed no performance regressions
for HW/SW RAID6 and DM stripe tests.
http://minggr.net/pub/20150608/fio_results/summary.log

v5:
  - rebase on top of 4.2-rc1
  - reorder patch 6,7
  - add NeilBrown's ACKs
  - fix memory leak: free "bio_split" bioset in blk_release_queue()

v4:
  - rebase on top of 4.1-rc4
  - use BIO_POOL_SIZE instead of number 4 for bioset_create()
  - call blk_queue_split() in blk_mq_make_request()
  - call blk_queue_split() in zram_make_request()
  - add patch "block: remove bio_get_nr_vecs()"
  - remove split code in blkdev_issue_discard()
  - drop patch "md/raid10: make sync_request_write() call bio_copy_data()".
NeilBrown queued it.
  - drop patch "block: allow __blk_queue_bounce() to handle bios larger than 
BIO_MAX_PAGES".
Will send it seperately

v3:
  - rebase on top of 4.1-rc2
  - support for QUEUE_FLAG_SG_GAPS
  - update commit logs of patch 2&4
  - split bio for chunk_aligned_read

v2: https://lkml.org/lkml/2015/4/28/28
v1: https://lkml.org/lkml/2014/12/22/128

This is the 5th attempt of simplifying block layer based on immutable
biovecs. Immutable biovecs, implemented by Kent Overstreet, have been
available in mainline since v3.14. Its original goal was actually making
generic_make_request() accept arbitrarily sized bios, and pushing the
splitting down to the drivers or wherever it's required. See also
discussions in the past, [1] [2] [3].

This will bring not only performance improvements, but also a great amount
of reduction in code complexity all over the block layer. Performance gain
is possible due to the fact that bio_add_page() does not have to check
unnecesary conditions such as queue limits or if biovecs are mergeable.
Those will be delegated to the driver level. Kent already said that he
actually benchmarked the impact of this with fio on a micron p320h, which
showed definitely a positive impact.

Moreover, this patchset also allows a lot of code to be deleted, mainly
because of removal of merge_bvec_fn() callbacks. We have been aware that
it has been always a delicate issue for stacking block drivers (e.g. md
and bcache) to handle merging bio consistently. This simplication will
help every individual block driver avoid having such an issue.

Patches are against 4.2-rc1. These are also available in my git repo at:

  
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req
  git://git.kernel.org/pub/scm/linux/kernel/git/mlin/linux.git block-generic-req

This patchset is a prerequisite of other consecutive patchsets, e.g.
multipage biovecs, rewriting plugging, or rewriting direct-IO, which are
excluded this time. That means, this patchset should not bring any
regression to end-users.

Comments are welcome.
Ming

[1] https://lkml.org/lkml/2014/11/23/263
[2] https://lkml.org/lkml/2013/11/25/732
[3] https://lkml.org/lkml/2014/2/26/618

Dongsu Park (1):
  Documentation: update notes in biovecs about arbitrarily sized bios

Kent Overstreet (8):
  block: make generic_make_request handle arbitrarily sized bios
  block: simplify bio_add_page()
  bcache: remove driver private bio splitting code
  btrfs: remove bio splitting and merge_bvec_fn() calls
  md/raid5: get rid of bio_fits_rdev()
  block: kill merge_bvec_fn() completely
  fs: use helper bio_add_page() instead of open coding on bi_io_vec
  block: remove bio_get_nr_vecs()

Ming Lin (2):
  block: remove split code in blkdev_issue_discard
  md/raid5: split bio for chunk_aligned_read

 Documentation/block/biovecs.txt |  10 +-
 block/bio.c | 152 ++--
 block/blk-core.c|  19 ++--
 block/blk-lib.c |  73 +++--
 block/blk-merge.c   | 148 +--
 block/blk-mq.c  |   4 +
 block/blk-settings.c|  22 
 block/blk-sysfs.c   |   3 +
 drivers/block/drbd/drbd_int.h   |   1 -
 drivers/block/drbd/drbd_main.c  |   1 -
 drivers/block/drbd/drbd_req.c   |  37 +--
 drivers/block/pktcdvd.c |  27 +
 drivers/block/ps3vram.c  

[PATCH v5 06/11] md/raid5: split bio for chunk_aligned_read

2015-07-06 Thread mlin
From: Ming Lin min...@ssi.samsung.com

If a read request fits entirely in a chunk, it will be passed directly to the
underlying device (providing it hasn't failed of course).  If it doesn't fit,
the slightly less efficient path that uses the stripe_cache is used.
Requests that get to the stripe cache are always completely split up as
necessary.

So with RAID5, ripping out the merge_bvec_fn doesn't cause it to stop work,
but could cause it to take the less efficient path more often.

All that is needed to manage this is for 'chunk_aligned_read' do some bio
splitting, much like the RAID0 code does.

Cc: Neil Brown ne...@suse.de
Cc: linux-r...@vger.kernel.org
Acked-by: NeilBrown ne...@suse.de
Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 drivers/md/raid5.c | 37 -
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 59e44e9..8377e72 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4799,7 +4799,7 @@ static int bio_fits_rdev(struct bio *bi)
return 1;
 }
 
-static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio)
+static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
 {
struct r5conf *conf = mddev-private;
int dd_idx;
@@ -4808,7 +4808,7 @@ static int chunk_aligned_read(struct mddev *mddev, struct 
bio * raid_bio)
sector_t end_sector;
 
if (!in_chunk_boundary(mddev, raid_bio)) {
-   pr_debug(chunk_aligned_read : non aligned\n);
+   pr_debug(%s: non aligned\n, __func__);
return 0;
}
/*
@@ -4885,6 +4885,31 @@ static int chunk_aligned_read(struct mddev *mddev, 
struct bio * raid_bio)
}
 }
 
+static struct bio *chunk_aligned_read(struct mddev *mddev, struct bio 
*raid_bio)
+{
+   struct bio *split;
+
+   do {
+   sector_t sector = raid_bio-bi_iter.bi_sector;
+   unsigned chunk_sects = mddev-chunk_sectors;
+   unsigned sectors = chunk_sects - (sector  (chunk_sects-1));
+
+   if (sectors  bio_sectors(raid_bio)) {
+   split = bio_split(raid_bio, sectors, GFP_NOIO, 
fs_bio_set);
+   bio_chain(split, raid_bio);
+   } else
+   split = raid_bio;
+
+   if (!raid5_read_one_chunk(mddev, split)) {
+   if (split != raid_bio)
+   generic_make_request(raid_bio);
+   return split;
+   }
+   } while (split != raid_bio);
+
+   return NULL;
+}
+
 /* __get_priority_stripe - get the next stripe to process
  *
  * Full stripe writes are allowed to pass preread active stripes up until
@@ -5162,9 +5187,11 @@ static void make_request(struct mddev *mddev, struct bio 
* bi)
 * data on failed drives.
 */
if (rw == READ  mddev-degraded == 0 
-mddev-reshape_position == MaxSector 
-chunk_aligned_read(mddev,bi))
-   return;
+   mddev-reshape_position == MaxSector) {
+   bi = chunk_aligned_read(mddev, bi);
+   if (!bi)
+   return;
+   }
 
if (unlikely(bi-bi_rw  REQ_DISCARD)) {
make_discard_request(mddev, bi);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 09/11] fs: use helper bio_add_page() instead of open coding on bi_io_vec

2015-07-06 Thread mlin
From: Kent Overstreet kent.overstr...@gmail.com

Call pre-defined helper bio_add_page() instead of open coding for
iterating through bi_io_vec[]. Doing that, it's possible to make some
parts in filesystems and mm/page_io.c simpler than before.

Acked-by: Dave Kleikamp sha...@kernel.org
Cc: Christoph Hellwig h...@infradead.org
Cc: Al Viro v...@zeniv.linux.org.uk
Cc: linux-fsde...@vger.kernel.org
Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park dp...@posteo.net
Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 fs/buffer.c |  7 ++-
 fs/jfs/jfs_logmgr.c | 14 --
 mm/page_io.c|  8 +++-
 3 files changed, 9 insertions(+), 20 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 1cf7a53..95996ba 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3046,12 +3046,9 @@ static int submit_bh_wbc(int rw, struct buffer_head *bh,
 
bio-bi_iter.bi_sector = bh-b_blocknr * (bh-b_size  9);
bio-bi_bdev = bh-b_bdev;
-   bio-bi_io_vec[0].bv_page = bh-b_page;
-   bio-bi_io_vec[0].bv_len = bh-b_size;
-   bio-bi_io_vec[0].bv_offset = bh_offset(bh);
 
-   bio-bi_vcnt = 1;
-   bio-bi_iter.bi_size = bh-b_size;
+   bio_add_page(bio, bh-b_page, bh-b_size, bh_offset(bh));
+   BUG_ON(bio-bi_iter.bi_size != bh-b_size);
 
bio-bi_end_io = end_bio_bh_io_sync;
bio-bi_private = bh;
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index bc462dc..46fae06 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -1999,12 +1999,9 @@ static int lbmRead(struct jfs_log * log, int pn, struct 
lbuf ** bpp)
 
bio-bi_iter.bi_sector = bp-l_blkno  (log-l2bsize - 9);
bio-bi_bdev = log-bdev;
-   bio-bi_io_vec[0].bv_page = bp-l_page;
-   bio-bi_io_vec[0].bv_len = LOGPSIZE;
-   bio-bi_io_vec[0].bv_offset = bp-l_offset;
 
-   bio-bi_vcnt = 1;
-   bio-bi_iter.bi_size = LOGPSIZE;
+   bio_add_page(bio, bp-l_page, LOGPSIZE, bp-l_offset);
+   BUG_ON(bio-bi_iter.bi_size != LOGPSIZE);
 
bio-bi_end_io = lbmIODone;
bio-bi_private = bp;
@@ -2145,12 +2142,9 @@ static void lbmStartIO(struct lbuf * bp)
bio = bio_alloc(GFP_NOFS, 1);
bio-bi_iter.bi_sector = bp-l_blkno  (log-l2bsize - 9);
bio-bi_bdev = log-bdev;
-   bio-bi_io_vec[0].bv_page = bp-l_page;
-   bio-bi_io_vec[0].bv_len = LOGPSIZE;
-   bio-bi_io_vec[0].bv_offset = bp-l_offset;
 
-   bio-bi_vcnt = 1;
-   bio-bi_iter.bi_size = LOGPSIZE;
+   bio_add_page(bio, bp-l_page, LOGPSIZE, bp-l_offset);
+   BUG_ON(bio-bi_iter.bi_size != LOGPSIZE);
 
bio-bi_end_io = lbmIODone;
bio-bi_private = bp;
diff --git a/mm/page_io.c b/mm/page_io.c
index 520baa4..194081b 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -33,12 +33,10 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
if (bio) {
bio-bi_iter.bi_sector = map_swap_page(page, bio-bi_bdev);
bio-bi_iter.bi_sector = PAGE_SHIFT - 9;
-   bio-bi_io_vec[0].bv_page = page;
-   bio-bi_io_vec[0].bv_len = PAGE_SIZE;
-   bio-bi_io_vec[0].bv_offset = 0;
-   bio-bi_vcnt = 1;
-   bio-bi_iter.bi_size = PAGE_SIZE;
bio-bi_end_io = end_io;
+
+   bio_add_page(bio, page, PAGE_SIZE, 0);
+   BUG_ON(bio-bi_iter.bi_size != PAGE_SIZE);
}
return bio;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-06 Thread mlin
From: Kent Overstreet kent.overstr...@gmail.com

The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.

But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them.  In the future this will
let us delete merge_bvec_fn and a bunch of other code.

We do this by adding calls to blk_queue_split() to the various
make_request functions that need it - a few can already handle arbitrary
size bios. Note that we add the call _after_ any call to
blk_queue_bounce(); this means that blk_queue_split() and
blk_recalc_rq_segments() don't need to be concerned with bouncing
affecting segment merging.

Some make_request_fn() callbacks were simple enough to audit and verify
they don't need blk_queue_split() calls. The skipped ones are:

 * nfhd_make_request (arch/m68k/emu/nfblock.c)
 * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
 * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
 * brd_make_request (ramdisk - drivers/block/brd.c)
 * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
 * loop_make_request
 * null_queue_bio
 * bcache's make_request fns

Some others are almost certainly safe to remove now, but will be left
for future patches.

Cc: Jens Axboe ax...@kernel.dk
Cc: Christoph Hellwig h...@infradead.org
Cc: Al Viro v...@zeniv.linux.org.uk
Cc: Ming Lei ming@canonical.com
Cc: Neil Brown ne...@suse.de
Cc: Alasdair Kergon a...@redhat.com
Cc: Mike Snitzer snit...@redhat.com
Cc: dm-de...@redhat.com
Cc: Lars Ellenberg drbd-...@lists.linbit.com
Cc: drbd-u...@lists.linbit.com
Cc: Jiri Kosina jkos...@suse.cz
Cc: Geoff Levand ge...@infradead.org
Cc: Jim Paris j...@jtan.com
Cc: Joshua Morris josh.h.mor...@us.ibm.com
Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com
Cc: Minchan Kim minc...@kernel.org
Cc: Nitin Gupta ngu...@vflare.org
Cc: Oleg Drokin oleg.dro...@intel.com
Cc: Andreas Dilger andreas.dil...@intel.com
Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits)
Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
[dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
Signed-off-by: Dongsu Park dp...@posteo.net
Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 block/blk-core.c|  19 ++--
 block/blk-merge.c   | 159 ++--
 block/blk-mq.c  |   4 +
 block/blk-sysfs.c   |   3 +
 drivers/block/drbd/drbd_req.c   |   2 +
 drivers/block/pktcdvd.c |   6 +-
 drivers/block/ps3vram.c |   2 +
 drivers/block/rsxx/dev.c|   2 +
 drivers/block/umem.c|   2 +
 drivers/block/zram/zram_drv.c   |   2 +
 drivers/md/dm.c |   2 +
 drivers/md/md.c |   2 +
 drivers/s390/block/dcssblk.c|   2 +
 drivers/s390/block/xpram.c  |   2 +
 drivers/staging/lustre/lustre/llite/lloop.c |   2 +
 include/linux/blkdev.h  |   3 +
 16 files changed, 192 insertions(+), 22 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 82819e6..cecf80c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -645,6 +645,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
if (q-id  0)
goto fail_q;
 
+   q-bio_split = bioset_create(BIO_POOL_SIZE, 0);
+   if (!q-bio_split)
+   goto fail_id;
+
q-backing_dev_info.ra_pages =
(VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
q-backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK;
@@ -653,7 +657,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
err = bdi_init(q-backing_dev_info);
if (err)
-   goto fail_id;
+   goto fail_split;
 
setup_timer(q-backing_dev_info.laptop_mode_wb_timer,
laptop_mode_timer_fn, (unsigned long) q);
@@ -695,6 +699,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
 fail_bdi:
bdi_destroy(q-backing_dev_info);
+fail_split:
+   bioset_free(q-bio_split);
 fail_id:
ida_simple_remove(blk_queue_ida, q-id);
 fail_q:
@@ -1612,6 +1618,8 @@ static void blk_queue_bio(struct request_queue *q, struct 
bio *bio)
struct request 

[PATCH v5 04/11] btrfs: remove bio splitting and merge_bvec_fn() calls

2015-07-06 Thread mlin
From: Kent Overstreet kent.overstr...@gmail.com

Btrfs has been doing bio splitting from btrfs_map_bio(), by checking
device limits as well as calling -merge_bvec_fn() etc. That is not
necessary any more, because generic_make_request() is now able to
handle arbitrarily sized bios. So clean up unnecessary code paths.

Cc: Chris Mason c...@fb.com
Cc: Josef Bacik jba...@fb.com
Cc: linux-bt...@vger.kernel.org
Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
Signed-off-by: Chris Mason c...@fb.com
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park dp...@posteo.net
Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 fs/btrfs/volumes.c | 72 --
 1 file changed, 72 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4b438b4..fd25b81 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5833,34 +5833,6 @@ static noinline void btrfs_schedule_bio(struct 
btrfs_root *root,
 device-work);
 }
 
-static int bio_size_ok(struct block_device *bdev, struct bio *bio,
-  sector_t sector)
-{
-   struct bio_vec *prev;
-   struct request_queue *q = bdev_get_queue(bdev);
-   unsigned int max_sectors = queue_max_sectors(q);
-   struct bvec_merge_data bvm = {
-   .bi_bdev = bdev,
-   .bi_sector = sector,
-   .bi_rw = bio-bi_rw,
-   };
-
-   if (WARN_ON(bio-bi_vcnt == 0))
-   return 1;
-
-   prev = bio-bi_io_vec[bio-bi_vcnt - 1];
-   if (bio_sectors(bio)  max_sectors)
-   return 0;
-
-   if (!q-merge_bvec_fn)
-   return 1;
-
-   bvm.bi_size = bio-bi_iter.bi_size - prev-bv_len;
-   if (q-merge_bvec_fn(q, bvm, prev)  prev-bv_len)
-   return 0;
-   return 1;
-}
-
 static void submit_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio,
  struct bio *bio, u64 physical, int dev_nr,
  int rw, int async)
@@ -5894,38 +5866,6 @@ static void submit_stripe_bio(struct btrfs_root *root, 
struct btrfs_bio *bbio,
btrfsic_submit_bio(rw, bio);
 }
 
-static int breakup_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio,
- struct bio *first_bio, struct btrfs_device *dev,
- int dev_nr, int rw, int async)
-{
-   struct bio_vec *bvec = first_bio-bi_io_vec;
-   struct bio *bio;
-   int nr_vecs = bio_get_nr_vecs(dev-bdev);
-   u64 physical = bbio-stripes[dev_nr].physical;
-
-again:
-   bio = btrfs_bio_alloc(dev-bdev, physical  9, nr_vecs, GFP_NOFS);
-   if (!bio)
-   return -ENOMEM;
-
-   while (bvec = (first_bio-bi_io_vec + first_bio-bi_vcnt - 1)) {
-   if (bio_add_page(bio, bvec-bv_page, bvec-bv_len,
-bvec-bv_offset)  bvec-bv_len) {
-   u64 len = bio-bi_iter.bi_size;
-
-   atomic_inc(bbio-stripes_pending);
-   submit_stripe_bio(root, bbio, bio, physical, dev_nr,
- rw, async);
-   physical += len;
-   goto again;
-   }
-   bvec++;
-   }
-
-   submit_stripe_bio(root, bbio, bio, physical, dev_nr, rw, async);
-   return 0;
-}
-
 static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical)
 {
atomic_inc(bbio-error);
@@ -5998,18 +5938,6 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, 
struct bio *bio,
continue;
}
 
-   /*
-* Check and see if we're ok with this bio based on it's size
-* and offset with the given device.
-*/
-   if (!bio_size_ok(dev-bdev, first_bio,
-bbio-stripes[dev_nr].physical  9)) {
-   ret = breakup_stripe_bio(root, bbio, first_bio, dev,
-dev_nr, rw, async_submit);
-   BUG_ON(ret);
-   continue;
-   }
-
if (dev_nr  total_devs - 1) {
bio = btrfs_bio_clone(first_bio, GFP_NOFS);
BUG_ON(!bio); /* -ENOMEM */
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 07/11] md/raid5: get rid of bio_fits_rdev()

2015-07-06 Thread mlin
From: Kent Overstreet kent.overstr...@gmail.com

Remove bio_fits_rdev() as sufficient merge_bvec_fn() handling is now
performed by blk_queue_split() in md_make_request().

Cc: Neil Brown ne...@suse.de
Cc: linux-r...@vger.kernel.org
Acked-by: NeilBrown ne...@suse.de
Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park dp...@posteo.net
Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 drivers/md/raid5.c | 23 +--
 1 file changed, 1 insertion(+), 22 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8377e72..8bdf81a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4780,25 +4780,6 @@ static void raid5_align_endio(struct bio *bi, int error)
add_bio_to_retry(raid_bi, conf);
 }
 
-static int bio_fits_rdev(struct bio *bi)
-{
-   struct request_queue *q = bdev_get_queue(bi-bi_bdev);
-
-   if (bio_sectors(bi)  queue_max_sectors(q))
-   return 0;
-   blk_recount_segments(q, bi);
-   if (bi-bi_phys_segments  queue_max_segments(q))
-   return 0;
-
-   if (q-merge_bvec_fn)
-   /* it's too hard to apply the merge_bvec_fn at this stage,
-* just just give up
-*/
-   return 0;
-
-   return 1;
-}
-
 static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
 {
struct r5conf *conf = mddev-private;
@@ -4852,11 +4833,9 @@ static int raid5_read_one_chunk(struct mddev *mddev, 
struct bio *raid_bio)
align_bi-bi_bdev =  rdev-bdev;
__clear_bit(BIO_SEG_VALID, align_bi-bi_flags);
 
-   if (!bio_fits_rdev(align_bi) ||
-   is_badblock(rdev, align_bi-bi_iter.bi_sector,
+   if (is_badblock(rdev, align_bi-bi_iter.bi_sector,
bio_sectors(align_bi),
first_bad, bad_sectors)) {
-   /* too big in some way, or has a known bad block */
bio_put(align_bi);
rdev_dec_pending(rdev, mddev);
return 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 00/11] simplify block layer based on immutable biovecs

2015-07-06 Thread mlin
Hi Mike,

On Wed, 2015-06-10 at 17:46 -0400, Mike Snitzer wrote:
 I've been busy getting DM changes for the 4.2 merge window finalized.
 As such I haven't connected with others on the team to discuss this
 issue.
 
 I'll see if we can make time in the next 2 days.  But I also have
 RHEL-specific kernel deadlines I'm coming up against.
 
 Seems late to be staging this extensive a change for 4.2... are you
 pushing for this code to land in the 4.2 merge window?  Or do we have
 time to work this further and target the 4.3 merge?
 

4.2-rc1 was out.
Would you have time to work together for 4.3 merge? 

Fio test results(4.1-rc4/rc7) showed no performance regressions
for HW/SW RAID6 and DM stripe tests.
http://minggr.net/pub/20150608/fio_results/summary.log

v5:
  - rebase on top of 4.2-rc1
  - reorder patch 6,7
  - add NeilBrown's ACKs
  - fix memory leak: free bio_split bioset in blk_release_queue()

v4:
  - rebase on top of 4.1-rc4
  - use BIO_POOL_SIZE instead of number 4 for bioset_create()
  - call blk_queue_split() in blk_mq_make_request()
  - call blk_queue_split() in zram_make_request()
  - add patch block: remove bio_get_nr_vecs()
  - remove split code in blkdev_issue_discard()
  - drop patch md/raid10: make sync_request_write() call bio_copy_data().
NeilBrown queued it.
  - drop patch block: allow __blk_queue_bounce() to handle bios larger than 
BIO_MAX_PAGES.
Will send it seperately

v3:
  - rebase on top of 4.1-rc2
  - support for QUEUE_FLAG_SG_GAPS
  - update commit logs of patch 24
  - split bio for chunk_aligned_read

v2: https://lkml.org/lkml/2015/4/28/28
v1: https://lkml.org/lkml/2014/12/22/128

This is the 5th attempt of simplifying block layer based on immutable
biovecs. Immutable biovecs, implemented by Kent Overstreet, have been
available in mainline since v3.14. Its original goal was actually making
generic_make_request() accept arbitrarily sized bios, and pushing the
splitting down to the drivers or wherever it's required. See also
discussions in the past, [1] [2] [3].

This will bring not only performance improvements, but also a great amount
of reduction in code complexity all over the block layer. Performance gain
is possible due to the fact that bio_add_page() does not have to check
unnecesary conditions such as queue limits or if biovecs are mergeable.
Those will be delegated to the driver level. Kent already said that he
actually benchmarked the impact of this with fio on a micron p320h, which
showed definitely a positive impact.

Moreover, this patchset also allows a lot of code to be deleted, mainly
because of removal of merge_bvec_fn() callbacks. We have been aware that
it has been always a delicate issue for stacking block drivers (e.g. md
and bcache) to handle merging bio consistently. This simplication will
help every individual block driver avoid having such an issue.

Patches are against 4.2-rc1. These are also available in my git repo at:

  
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req
  git://git.kernel.org/pub/scm/linux/kernel/git/mlin/linux.git block-generic-req

This patchset is a prerequisite of other consecutive patchsets, e.g.
multipage biovecs, rewriting plugging, or rewriting direct-IO, which are
excluded this time. That means, this patchset should not bring any
regression to end-users.

Comments are welcome.
Ming

[1] https://lkml.org/lkml/2014/11/23/263
[2] https://lkml.org/lkml/2013/11/25/732
[3] https://lkml.org/lkml/2014/2/26/618

Dongsu Park (1):
  Documentation: update notes in biovecs about arbitrarily sized bios

Kent Overstreet (8):
  block: make generic_make_request handle arbitrarily sized bios
  block: simplify bio_add_page()
  bcache: remove driver private bio splitting code
  btrfs: remove bio splitting and merge_bvec_fn() calls
  md/raid5: get rid of bio_fits_rdev()
  block: kill merge_bvec_fn() completely
  fs: use helper bio_add_page() instead of open coding on bi_io_vec
  block: remove bio_get_nr_vecs()

Ming Lin (2):
  block: remove split code in blkdev_issue_discard
  md/raid5: split bio for chunk_aligned_read

 Documentation/block/biovecs.txt |  10 +-
 block/bio.c | 152 ++--
 block/blk-core.c|  19 ++--
 block/blk-lib.c |  73 +++--
 block/blk-merge.c   | 148 +--
 block/blk-mq.c  |   4 +
 block/blk-settings.c|  22 
 block/blk-sysfs.c   |   3 +
 drivers/block/drbd/drbd_int.h   |   1 -
 drivers/block/drbd/drbd_main.c  |   1 -
 drivers/block/drbd/drbd_req.c   |  37 +--
 drivers/block/pktcdvd.c |  27 +
 drivers/block/ps3vram.c |   2 +
 drivers/block/rbd.c |  47 -
 drivers/block/rsxx/dev.c

[PATCH v5 02/11] block: simplify bio_add_page()

2015-07-06 Thread mlin
From: Kent Overstreet kent.overstr...@gmail.com

Since generic_make_request() can now handle arbitrary size bios, all we
have to do is make sure the bvec array doesn't overflow.
__bio_add_page() doesn't need to call -merge_bvec_fn(), where
we can get rid of unnecessary code paths.

Removing the call to -merge_bvec_fn() is also fine, as no driver that
implements support for BLOCK_PC commands even has a -merge_bvec_fn()
method.

Cc: Christoph Hellwig h...@infradead.org
Cc: Jens Axboe ax...@kernel.dk
Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
[dpark: rebase and resolve merge conflicts, change a couple of comments,
 make bio_add_page() warn once upon a cloned bio.]
Signed-off-by: Dongsu Park dp...@posteo.net
Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 block/bio.c | 135 +---
 1 file changed, 55 insertions(+), 80 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 2a00d34..da15e9a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -714,9 +714,23 @@ int bio_get_nr_vecs(struct block_device *bdev)
 }
 EXPORT_SYMBOL(bio_get_nr_vecs);
 
-static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
- *page, unsigned int len, unsigned int offset,
- unsigned int max_sectors)
+/**
+ * bio_add_pc_page -   attempt to add page to bio
+ * @q: the target queue
+ * @bio: destination bio
+ * @page: page to add
+ * @len: vec entry length
+ * @offset: vec entry offset
+ *
+ * Attempt to add a page to the bio_vec maplist. This can fail for a
+ * number of reasons, such as the bio being full or target block device
+ * limitations. The target block device must allow bio's up to PAGE_SIZE,
+ * so it is always possible to add a single page to an empty bio.
+ *
+ * This should only be used by REQ_PC bios.
+ */
+int bio_add_pc_page(struct request_queue *q, struct bio *bio, struct page
+   *page, unsigned int len, unsigned int offset)
 {
int retried_segments = 0;
struct bio_vec *bvec;
@@ -727,7 +741,7 @@ static int __bio_add_page(struct request_queue *q, struct 
bio *bio, struct page
if (unlikely(bio_flagged(bio, BIO_CLONED)))
return 0;
 
-   if (((bio-bi_iter.bi_size + len)  9)  max_sectors)
+   if (((bio-bi_iter.bi_size + len)  9)  queue_max_hw_sectors(q))
return 0;
 
/*
@@ -740,28 +754,7 @@ static int __bio_add_page(struct request_queue *q, struct 
bio *bio, struct page
 
if (page == prev-bv_page 
offset == prev-bv_offset + prev-bv_len) {
-   unsigned int prev_bv_len = prev-bv_len;
prev-bv_len += len;
-
-   if (q-merge_bvec_fn) {
-   struct bvec_merge_data bvm = {
-   /* prev_bvec is already charged in
-  bi_size, discharge it in order to
-  simulate merging updated prev_bvec
-  as new bvec. */
-   .bi_bdev = bio-bi_bdev,
-   .bi_sector = bio-bi_iter.bi_sector,
-   .bi_size = bio-bi_iter.bi_size -
-   prev_bv_len,
-   .bi_rw = bio-bi_rw,
-   };
-
-   if (q-merge_bvec_fn(q, bvm, prev)  
prev-bv_len) {
-   prev-bv_len -= len;
-   return 0;
-   }
-   }
-
bio-bi_iter.bi_size += len;
goto done;
}
@@ -804,27 +797,6 @@ static int __bio_add_page(struct request_queue *q, struct 
bio *bio, struct page
blk_recount_segments(q, bio);
}
 
-   /*
-* if queue has other restrictions (eg varying max sector size
-* depending on offset), it can specify a merge_bvec_fn in the
-* queue to get further control
-*/
-   if (q-merge_bvec_fn) {
-   struct bvec_merge_data bvm = {
-   .bi_bdev = bio-bi_bdev,
-   .bi_sector = bio-bi_iter.bi_sector,
-   .bi_size = bio-bi_iter.bi_size - len,
-   .bi_rw = bio-bi_rw,
-   };
-
-   /*
-* merge_bvec_fn() returns number of bytes it can accept
-* at this offset
-*/
-   if (q-merge_bvec_fn(q, bvm, bvec)  bvec-bv_len)
-   goto failed;
-   }
-
/* If we may be able to merge these biovecs, force a recount */
if (bio-bi_vcnt  1  (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))

[PATCH v5 08/11] block: kill merge_bvec_fn() completely

2015-07-06 Thread mlin
From: Kent Overstreet kent.overstr...@gmail.com

As generic_make_request() is now able to handle arbitrarily sized bios,
it's no longer necessary for each individual block driver to define its
own -merge_bvec_fn() callback. Remove every invocation completely.

Cc: Jens Axboe ax...@kernel.dk
Cc: Lars Ellenberg drbd-...@lists.linbit.com
Cc: drbd-u...@lists.linbit.com
Cc: Jiri Kosina jkos...@suse.cz
Cc: Yehuda Sadeh yeh...@inktank.com
Cc: Sage Weil s...@inktank.com
Cc: Alex Elder el...@kernel.org
Cc: ceph-de...@vger.kernel.org
Cc: Alasdair Kergon a...@redhat.com
Cc: Mike Snitzer snit...@redhat.com
Cc: dm-de...@redhat.com
Cc: Neil Brown ne...@suse.de
Cc: linux-r...@vger.kernel.org
Cc: Christoph Hellwig h...@infradead.org
Cc: Martin K. Petersen martin.peter...@oracle.com
Acked-by: NeilBrown ne...@suse.de (for the 'md' bits)
Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
[dpark: also remove -merge_bvec_fn() in dm-thin as well as
 dm-era-target, and resolve merge conflicts]
Signed-off-by: Dongsu Park dp...@posteo.net
Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 block/blk-merge.c  |  17 +-
 block/blk-settings.c   |  22 ---
 drivers/block/drbd/drbd_int.h  |   1 -
 drivers/block/drbd/drbd_main.c |   1 -
 drivers/block/drbd/drbd_req.c  |  35 
 drivers/block/pktcdvd.c|  21 ---
 drivers/block/rbd.c|  47 ---
 drivers/md/dm-cache-target.c   |  21 ---
 drivers/md/dm-crypt.c  |  16 --
 drivers/md/dm-era-target.c |  15 -
 drivers/md/dm-flakey.c |  16 --
 drivers/md/dm-linear.c |  16 --
 drivers/md/dm-log-writes.c |  16 --
 drivers/md/dm-raid.c   |  19 --
 drivers/md/dm-snap.c   |  15 -
 drivers/md/dm-stripe.c |  21 ---
 drivers/md/dm-table.c  |   8 ---
 drivers/md/dm-thin.c   |  31 --
 drivers/md/dm-verity.c |  16 --
 drivers/md/dm.c| 127 +
 drivers/md/dm.h|   2 -
 drivers/md/linear.c|  43 --
 drivers/md/md.c|  26 -
 drivers/md/md.h|  12 
 drivers/md/multipath.c |  21 ---
 drivers/md/raid0.c |  56 --
 drivers/md/raid0.h |   2 -
 drivers/md/raid1.c |  58 +--
 drivers/md/raid10.c| 121 +--
 drivers/md/raid5.c |  32 ---
 include/linux/blkdev.h |  10 
 include/linux/device-mapper.h  |   4 --
 32 files changed, 9 insertions(+), 859 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 3707f30..1f5dfa0 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -69,24 +69,13 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
struct bio *split;
struct bio_vec bv, bvprv;
struct bvec_iter iter;
-   unsigned seg_size = 0, nsegs = 0;
+   unsigned seg_size = 0, nsegs = 0, sectors = 0;
int prev = 0;
 
-   struct bvec_merge_data bvm = {
-   .bi_bdev= bio-bi_bdev,
-   .bi_sector  = bio-bi_iter.bi_sector,
-   .bi_size= 0,
-   .bi_rw  = bio-bi_rw,
-   };
-
bio_for_each_segment(bv, bio, iter) {
-   if (q-merge_bvec_fn 
-   q-merge_bvec_fn(q, bvm, bv)  (int) bv.bv_len)
-   goto split;
-
-   bvm.bi_size += bv.bv_len;
+   sectors += bv.bv_len  9;
 
-   if (bvm.bi_size  9  queue_max_sectors(q))
+   if (sectors  queue_max_sectors(q))
goto split;
 
/*
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 12600bf..e90d477 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -53,28 +53,6 @@ void blk_queue_unprep_rq(struct request_queue *q, 
unprep_rq_fn *ufn)
 }
 EXPORT_SYMBOL(blk_queue_unprep_rq);
 
-/**
- * blk_queue_merge_bvec - set a merge_bvec function for queue
- * @q: queue
- * @mbfn:  merge_bvec_fn
- *
- * Usually queues have static limitations on the max sectors or segments that
- * we can put in a request. Stacking drivers may have some settings that
- * are dynamic, and thus we have to query the queue whether it is ok to
- * add a new bio_vec to a bio at a given offset or not. If the block device
- * has such limitations, it needs to register a merge_bvec_fn to control
- * the size of bio's sent to it. Note that a block device *must* allow a
- * single page to be added to an empty bio. The block device driver may want
- * to use the bio_split() function to deal with these bio's. By default
- * no merge_bvec_fn is defined for a queue, and only the fixed limits are
- * honored.
- */
-void blk_queue_merge_bvec(struct request_queue *q, merge_bvec_fn *mbfn)
-{
-   q-merge_bvec_fn = mbfn;
-}

[PATCH v5 03/11] bcache: remove driver private bio splitting code

2015-07-06 Thread mlin
From: Kent Overstreet kent.overstr...@gmail.com

The bcache driver has always accepted arbitrarily large bios and split
them internally.  Now that every driver must accept arbitrarily large
bios this code isn't nessecary anymore.

Cc: linux-bca...@vger.kernel.org
Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park dp...@posteo.net
Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 drivers/md/bcache/bcache.h|  18 
 drivers/md/bcache/io.c| 100 +-
 drivers/md/bcache/journal.c   |   4 +-
 drivers/md/bcache/request.c   |  16 +++
 drivers/md/bcache/super.c |  32 +-
 drivers/md/bcache/util.h  |   5 ++-
 drivers/md/bcache/writeback.c |   4 +-
 7 files changed, 18 insertions(+), 161 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 04f7bc2..6b420a5 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -243,19 +243,6 @@ struct keybuf {
DECLARE_ARRAY_ALLOCATOR(struct keybuf_key, freelist, KEYBUF_NR);
 };
 
-struct bio_split_pool {
-   struct bio_set  *bio_split;
-   mempool_t   *bio_split_hook;
-};
-
-struct bio_split_hook {
-   struct closure  cl;
-   struct bio_split_pool   *p;
-   struct bio  *bio;
-   bio_end_io_t*bi_end_io;
-   void*bi_private;
-};
-
 struct bcache_device {
struct closure  cl;
 
@@ -288,8 +275,6 @@ struct bcache_device {
int (*cache_miss)(struct btree *, struct search *,
  struct bio *, unsigned);
int (*ioctl) (struct bcache_device *, fmode_t, unsigned, unsigned long);
-
-   struct bio_split_pool   bio_split_hook;
 };
 
 struct io {
@@ -454,8 +439,6 @@ struct cache {
atomic_long_t   meta_sectors_written;
atomic_long_t   btree_sectors_written;
atomic_long_t   sectors_written;
-
-   struct bio_split_pool   bio_split_hook;
 };
 
 struct gc_stat {
@@ -873,7 +856,6 @@ void bch_bbio_endio(struct cache_set *, struct bio *, int, 
const char *);
 void bch_bbio_free(struct bio *, struct cache_set *);
 struct bio *bch_bbio_alloc(struct cache_set *);
 
-void bch_generic_make_request(struct bio *, struct bio_split_pool *);
 void __bch_submit_bbio(struct bio *, struct cache_set *);
 void bch_submit_bbio(struct bio *, struct cache_set *, struct bkey *, 
unsigned);
 
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index cb64e64a..86a0bb8 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -11,104 +11,6 @@
 
 #include linux/blkdev.h
 
-static unsigned bch_bio_max_sectors(struct bio *bio)
-{
-   struct request_queue *q = bdev_get_queue(bio-bi_bdev);
-   struct bio_vec bv;
-   struct bvec_iter iter;
-   unsigned ret = 0, seg = 0;
-
-   if (bio-bi_rw  REQ_DISCARD)
-   return min(bio_sectors(bio), q-limits.max_discard_sectors);
-
-   bio_for_each_segment(bv, bio, iter) {
-   struct bvec_merge_data bvm = {
-   .bi_bdev= bio-bi_bdev,
-   .bi_sector  = bio-bi_iter.bi_sector,
-   .bi_size= ret  9,
-   .bi_rw  = bio-bi_rw,
-   };
-
-   if (seg == min_t(unsigned, BIO_MAX_PAGES,
-queue_max_segments(q)))
-   break;
-
-   if (q-merge_bvec_fn 
-   q-merge_bvec_fn(q, bvm, bv)  (int) bv.bv_len)
-   break;
-
-   seg++;
-   ret += bv.bv_len  9;
-   }
-
-   ret = min(ret, queue_max_sectors(q));
-
-   WARN_ON(!ret);
-   ret = max_t(int, ret, bio_iovec(bio).bv_len  9);
-
-   return ret;
-}
-
-static void bch_bio_submit_split_done(struct closure *cl)
-{
-   struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-   s-bio-bi_end_io = s-bi_end_io;
-   s-bio-bi_private = s-bi_private;
-   bio_endio(s-bio, 0);
-
-   closure_debug_destroy(s-cl);
-   mempool_free(s, s-p-bio_split_hook);
-}
-
-static void bch_bio_submit_split_endio(struct bio *bio, int error)
-{
-   struct closure *cl = bio-bi_private;
-   struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-   if (error)
-   clear_bit(BIO_UPTODATE, s-bio-bi_flags);
-
-   bio_put(bio);
-   closure_put(cl);
-}
-
-void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
-{
-   struct bio_split_hook *s;
-   struct bio *n;
-
-   if (!bio_has_data(bio)  !(bio-bi_rw  REQ_DISCARD))
-   goto submit;
-
-   if (bio_sectors(bio) = bch_bio_max_sectors(bio))
-   goto submit;
-
-   s = mempool_alloc(p-bio_split_hook, GFP_NOIO);
-   closure_init(s-cl, NULL);
-
-  

[PATCH v5 05/11] block: remove split code in blkdev_issue_discard

2015-07-06 Thread mlin
From: Ming Lin min...@ssi.samsung.com

The split code in blkdev_issue_discard() can go away now
that any driver that cares does the split.

Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 block/blk-lib.c | 73 +++--
 1 file changed, 14 insertions(+), 59 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7688ee3..3bf3c4a 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -43,34 +43,17 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
-   unsigned int max_discard_sectors, granularity;
-   int alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
struct blk_plug plug;
 
-   if (!q)
+   if (!q || !nr_sects)
return -ENXIO;
 
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
 
-   /* Zero-sector (unknown) and one-sector granularities are the same.  */
-   granularity = max(q-limits.discard_granularity  9, 1U);
-   alignment = (bdev_discard_alignment(bdev)  9) % granularity;
-
-   /*
-* Ensure that max_discard_sectors is of the proper
-* granularity, so that requests stay aligned after a split.
-*/
-   max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX  9);
-   max_discard_sectors -= max_discard_sectors % granularity;
-   if (unlikely(!max_discard_sectors)) {
-   /* Avoid infinite loop below. Being cautious never hurts. */
-   return -EOPNOTSUPP;
-   }
-
if (flags  BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
return -EOPNOTSUPP;
@@ -82,52 +65,24 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
bb.wait = wait;
 
blk_start_plug(plug);
-   while (nr_sects) {
-   unsigned int req_sects;
-   sector_t end_sect, tmp;
 
-   bio = bio_alloc(gfp_mask, 1);
-   if (!bio) {
-   ret = -ENOMEM;
-   break;
-   }
+   bio = bio_alloc(gfp_mask, 1);
+   if (!bio) {
+   ret = -ENOMEM;
+   goto out;
+   }
 
-   req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
-
-   /*
-* If splitting a request, and the next starting sector would be
-* misaligned, stop the discard at the previous aligned sector.
-*/
-   end_sect = sector + req_sects;
-   tmp = end_sect;
-   if (req_sects  nr_sects 
-   sector_div(tmp, granularity) != alignment) {
-   end_sect = end_sect - alignment;
-   sector_div(end_sect, granularity);
-   end_sect = end_sect * granularity + alignment;
-   req_sects = end_sect - sector;
-   }
+   bio-bi_iter.bi_sector = sector;
+   bio-bi_end_io = bio_batch_end_io;
+   bio-bi_bdev = bdev;
+   bio-bi_private = bb;
 
-   bio-bi_iter.bi_sector = sector;
-   bio-bi_end_io = bio_batch_end_io;
-   bio-bi_bdev = bdev;
-   bio-bi_private = bb;
+   bio-bi_iter.bi_size = nr_sects  9;
 
-   bio-bi_iter.bi_size = req_sects  9;
-   nr_sects -= req_sects;
-   sector = end_sect;
+   atomic_inc(bb.done);
+   submit_bio(type, bio);
 
-   atomic_inc(bb.done);
-   submit_bio(type, bio);
-
-   /*
-* We can loop for a long time in here, if someone does
-* full device discards (like mkfs). Be nice and allow
-* us to schedule out to avoid softlocking if preempt
-* is disabled.
-*/
-   cond_resched();
-   }
+out:
blk_finish_plug(plug);
 
/* Wait for bios in-flight */
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 10/11] block: remove bio_get_nr_vecs()

2015-07-06 Thread mlin
From: Kent Overstreet kent.overstr...@gmail.com

We can always fill up the bio now, no need to estimate the possible
size based on queue parameters.

Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
[hch: rebased and wrote a changelog]
Signed-off-by: Christoph Hellwig h...@lst.de
Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 block/bio.c| 23 ---
 drivers/md/dm-io.c |  2 +-
 fs/btrfs/compression.c |  5 +
 fs/btrfs/extent_io.c   |  9 ++---
 fs/btrfs/inode.c   |  3 +--
 fs/btrfs/scrub.c   | 18 ++
 fs/direct-io.c |  2 +-
 fs/ext4/page-io.c  |  3 +--
 fs/ext4/readpage.c |  2 +-
 fs/f2fs/data.c |  2 +-
 fs/gfs2/lops.c |  9 +
 fs/logfs/dev_bdev.c|  4 ++--
 fs/mpage.c |  4 ++--
 fs/nilfs2/segbuf.c |  2 +-
 fs/xfs/xfs_aops.c  |  3 +--
 include/linux/bio.h|  1 -
 16 files changed, 18 insertions(+), 74 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index da15e9a..f28ca16 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -692,29 +692,6 @@ integrity_clone:
 EXPORT_SYMBOL(bio_clone_bioset);
 
 /**
- * bio_get_nr_vecs - return approx number of vecs
- * @bdev:  I/O target
- *
- * Return the approximate number of pages we can send to this target.
- * There's no guarantee that you will be able to fit this number of pages
- * into a bio, it does not account for dynamic restrictions that vary
- * on offset.
- */
-int bio_get_nr_vecs(struct block_device *bdev)
-{
-   struct request_queue *q = bdev_get_queue(bdev);
-   int nr_pages;
-
-   nr_pages = min_t(unsigned,
-queue_max_segments(q),
-queue_max_sectors(q) / (PAGE_SIZE  9) + 1);
-
-   return min_t(unsigned, nr_pages, BIO_MAX_PAGES);
-
-}
-EXPORT_SYMBOL(bio_get_nr_vecs);
-
-/**
  * bio_add_pc_page -   attempt to add page to bio
  * @q: the target queue
  * @bio: destination bio
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 74adcd2..7d64272 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -314,7 +314,7 @@ static void do_region(int rw, unsigned region, struct 
dm_io_region *where,
if ((rw  REQ_DISCARD) || (rw  REQ_WRITE_SAME))
num_bvecs = 1;
else
-   num_bvecs = min_t(int, bio_get_nr_vecs(where-bdev),
+   num_bvecs = min_t(int, BIO_MAX_PAGES,
  dm_sector_div_up(remaining, 
(PAGE_SIZE  SECTOR_SHIFT)));
 
bio = bio_alloc_bioset(GFP_NOIO, num_bvecs, io-client-bios);
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index ce62324..449c752 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -97,10 +97,7 @@ static inline int compressed_bio_size(struct btrfs_root 
*root,
 static struct bio *compressed_bio_alloc(struct block_device *bdev,
u64 first_byte, gfp_t gfp_flags)
 {
-   int nr_vecs;
-
-   nr_vecs = bio_get_nr_vecs(bdev);
-   return btrfs_bio_alloc(bdev, first_byte  9, nr_vecs, gfp_flags);
+   return btrfs_bio_alloc(bdev, first_byte  9, BIO_MAX_PAGES, gfp_flags);
 }
 
 static int check_compressed_csum(struct inode *inode,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 02d0581..ba89efd 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2802,9 +2802,7 @@ static int submit_extent_page(int rw, struct 
extent_io_tree *tree,
 {
int ret = 0;
struct bio *bio;
-   int nr;
int contig = 0;
-   int this_compressed = bio_flags  EXTENT_BIO_COMPRESSED;
int old_compressed = prev_bio_flags  EXTENT_BIO_COMPRESSED;
size_t page_size = min_t(size_t, size, PAGE_CACHE_SIZE);
 
@@ -2829,12 +2827,9 @@ static int submit_extent_page(int rw, struct 
extent_io_tree *tree,
return 0;
}
}
-   if (this_compressed)
-   nr = BIO_MAX_PAGES;
-   else
-   nr = bio_get_nr_vecs(bdev);
 
-   bio = btrfs_bio_alloc(bdev, sector, nr, GFP_NOFS | __GFP_HIGH);
+   bio = btrfs_bio_alloc(bdev, sector, BIO_MAX_PAGES,
+   GFP_NOFS | __GFP_HIGH);
if (!bio)
return -ENOMEM;
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 855935f..d66b9a3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7959,8 +7959,7 @@ out:
 static struct bio *btrfs_dio_bio_alloc(struct block_device *bdev,
   u64 first_sector, gfp_t gfp_flags)
 {
-   int nr_vecs = bio_get_nr_vecs(bdev);
-   return btrfs_bio_alloc(bdev, first_sector, nr_vecs, gfp_flags);
+   return btrfs_bio_alloc(bdev, first_sector, BIO_MAX_PAGES, gfp_flags);
 }
 
 static inline int btrfs_lookup_and_bind_dio_csum(struct btrfs_root *root,
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 9f2feab..aab0b9a 100644
--- 

[PATCH v5 11/11] Documentation: update notes in biovecs about arbitrarily sized bios

2015-07-06 Thread mlin
From: Dongsu Park dp...@posteo.net

Update block/biovecs.txt so that it includes a note on what kind of
effects arbitrarily sized bios would bring to the block layer.
Also fix a trivial typo, bio_iter_iovec.

Cc: Christoph Hellwig h...@infradead.org
Cc: Kent Overstreet kent.overstr...@gmail.com
Cc: Jonathan Corbet cor...@lwn.net
Cc: linux-...@vger.kernel.org
Signed-off-by: Dongsu Park dp...@posteo.net
Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 Documentation/block/biovecs.txt | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt
index 74a32ad..2568958 100644
--- a/Documentation/block/biovecs.txt
+++ b/Documentation/block/biovecs.txt
@@ -24,7 +24,7 @@ particular, presenting the illusion of partially completed 
biovecs so that
 normal code doesn't have to deal with bi_bvec_done.
 
  * Driver code should no longer refer to biovecs directly; we now have
-   bio_iovec() and bio_iovec_iter() macros that return literal struct biovecs,
+   bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs,
constructed from the raw biovecs but taking into account bi_bvec_done and
bi_size.
 
@@ -109,3 +109,11 @@ Other implications:
over all the biovecs in the new bio - which is silly as it's not needed.
 
So, don't use bi_vcnt anymore.
+
+ * The current interface allows the block layer to split bios as needed, so we
+   could eliminate a lot of complexity particularly in stacked drivers. Code
+   that creates bios can then create whatever size bios are convenient, and
+   more importantly stacked drivers don't have to deal with both their own bio
+   size limitations and the limitations of the underlying devices. Thus
+   there's no need to define -merge_bvec_fn() callbacks for individual block
+   drivers.
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/